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We develop a general framework for data analysis and phenomenology of the CMB four-point 
function or trispectrum. To lowest order in the derivative expansion, the inflationary action admits 
three quartic operators consistent with symmetry: a 4 , a 2 (da 2 ), and (da) 4 . In single field inflation, 
only the first of these operators can be the leading non-Gaussian signal. A Fisher matrix analysis 
shows that there is one near-degeneracy among the three CMB trispectra, so we parameterize the 
trispectrum with two coefficients gff L and • i n addition to the coefficient g l ff c L of £ 3 -type local 

non-Gaussianity. This three-parameter space is analogous to the parameter space (/](?£, fj^Li Inl 1 ) 
commonly used to parameterize the CMB three-point function. We next turn to data analysis and 
show how to represent these trispectra in a factorizable form which leads to computationally fast 
operations such as evaluating a CMB estimator or simulating a non-Gaussian CMB. We discuss 
practical issues in CMB analysis pipelines, and perform an optimal analysis of WMAP data. Our 


minimum-variance estimates are g l f) c L = (—3.80 ± 2.19) x 10 5 , gf, L = (—3.20 ± 3.09) x 10 6 , and 

9nl? = (—10.8 ± 6.33) x 10 5 after correcting for the effects of CMB lensing. No evidence of a 
nonzero inflationary four-point function is seen. 


I. INTRODUCTION AND MAIN RESULTS 

Uncovering the nature of inflation is one of the most important open questions in our current cosmological 
model. Non-Gaussianity of the primordial density perturbations probes the interaction structure of the inflationary 
Lagrangian. Since interactions contain most of the information on the dynamics of the fields, the search for primordial 
non-Gaussianity has played a central role in constraining the physics of inflation. 

So far, the search for non-Gaussianity has been focused mainly on the bispectrum, or 3-point function. Limits 
on the inflationary 3-point function have been obtained following two different approaches. The first is based on 
providing templates for 3-point functions that are matched against the data, while the second approach attempts to 
reconstruct a generic 3-point function from the data. The first method has the advantage that it can be restricted to 
theoretically motivated models over which one can perform an optimal analysis, but the disadvantage of potentially 
missing a signal in the data simply because it was not looked for. It has been used to search for the (now) well-known 
local [1-4], equilateral [5], and orthogonal [6] template bispectra, plus some very recently identified higher-derivative 
bispectra [7]. The second approach (e.g. [8]) has the advantage of being sensitive to any potential signal, but the 
disadvantage that significance of a signal can be diluted away as many independent shapes are matched to the data. 

At present, the most constraining search for non-Gaussianity is provided by Planck [9], which finds no evidence 
of non-Gaussianity. While this analysis is a huge observational achievement, it should be stressed that from a particle 
physics point of view the limit is still rather weak. The skewness of the distribution of the primordial fluctuations is 
constrained to be smaller than about 10 -3 . This constrains inflation to be more or less as interacting as the electron 
in quantum electrodynamics, or as the pion at energies of order of its mass. It would be clearly very interesting 
to further constrain the level of non-Gaussianity by one or two orders of magnitude, a sensitivity that the recently 
developed effective field theory of large scale structure [10] has shown the potential to achieve with surveys in the 
next decade. 

The observational interest in non-Gaussianity is not just due to the fact that it is related to the dynamics of 
the theory. It additionally represents a very non-trivial signal. Because of translation and rotation invariance, the 
two point function of the primordial density perturbation is described by a scalar function of the modulus of the 
wavenumber k. Once we impose approximate scale invariance, this function can be described by a number, the 
amplitude, and another number, the slight deviation from scale invariance, the tilt. Instead, after assuming the same 
symmetries ( i.e . translation, rotation and scale invariance), the bispectrum is described by a scalar function of two 
scalar variables [11]. We pass from one single number to a full function of two variables. Clearly, a detection of 
such a signal would be an extremely non-trivial signature in the sky. It is the same information that describes 2-to-2 
scattering in a collider. When we pass to the trispectrum, the same symmetries make the trispectrum a scalar function 
of five variables. This is a fantastically non-trivial function that if we were so lucky to be able to see it in the sky, it 
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would offer tremendous constraining power on the physics of inflation. 

The analysis of the trispectrum, or four-point function, of the primordial density perturbation is less developed 
than the bispectrum. We briefly summarize existing results in the literature. The so-called g l ^ c L trispectrum is 
generated if the primordial curvature perturbation £(x) can be expressed in the form: 

C(£) = ( g {x) + ^gwlCgix) 3 (1) 

where is a Gaussian field. This leads to the following £-trispectrum: 

(C kl Ck 2 Ck 3 Ck 4 } = Y b 9N C L (P C (*h)P C ( W(fc 3 ) + 3 perm.) ( 2tt) 3 S 3 ( £ k„) (2) 

While it is impossible to obtain such a signal in single field inflation, as Maldacena’s consistency condition [12] 
generalized to the four-point function (e.g. [13]) shows 1 , there are technically natural multifield inflationary models 
that generate this signal without generating an observationally larger bispectrum [4]. If we call the additional light 
field cr’s, a measurable g^ c L can be enforced by imposing, just as an example, an approximate Zi symmetry of the 
cr’s or protecting them with an approximate supersymmetry [4]. Several groups have constrained g l § c L from WMAP 
data [15-18], and most recently [19] who use the optimal estimator. 

Another “local” four-point function is the rjvz,-trispectrum, defined by: 

(Ck!Ck 2 Ck 3 Ck 4 ) = TNLPc(k 2 )Pdk4)P({\ki + k 2 |)(27r) 3 <5 3 (^k i ) + (11 perm.) (3) 

The TjVL-trispectrum can arise if £ is a local quadratic combination of multiple uncorrelated fields. For example, 
suppose 


C( x ) = Cg(x) + ACg(x)<j(x) 


( 4 ) 


where A is a free parameter and £g, a are uncorrelated Gaussian fields with equal power spectra. In this model, the 
three-point function is zero and the four-point function takes the form (3) with tnl = A 2 . The parameter t^l has 
been constrained from WMAP [16, 18] and Planck [9] data. 

Going beyond the local-type signals g^ c L and tnl , the only primordial trispectrum which has been constrained 
is an “equilateral” trispectrum, which we will denote g^ L and define by: 
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( 5 ) 


where A^ is the amplitude of the £ power spectrum, defined by P^(k) = A^/k 3 . In the effective field theory description 
of inflation, this trispectrum arises from a quartic operator of the schematic form a 4 (we will be more precise in the 
next section). WMAP5 constraints on this trispectrum were reported in [17]. 

Now is a good time to explain our normalization convention in Eq. (5). Recall that in the bispectrum case, the 
bispectrum parameters (/_yl , Jnl 1 ) are normalized by fixing the bispectrum amplitude on equilateral triangles to have 
the same value as the local bispectrum with f l ^ c L = 1. Analogously, we normalize trispectra so that (CkiCk 2 Ck 3 Ck 4 ) = 
( 216/25)gNLA 3 /k 9 for tetrahedral 4-point configurations with |k,| = k and k, k j = — fc 2 /3 for i ^ j. This convention 
fixes all trispectra to have the same value on tetrahedrons as the local trispectrum with g'§ c L = 1. Another detail: 
in Eq. (5), and in Eqs. (6), (7) below, we write the trispectrum in two forms, either with a time integral which 
is unevaluated, or after evaluation of the integral. We do this because the first form will be directly useful when 
obtaining factorizable representations for the trispectra, as we will explain later. 


1 The fact that Maldacena’s consistency condition gives a non-zero bispectrum or trispectrum in the squeezed limit should not be regarded 
as predicting a non-vanishing physical signal in that limit. It is indeed the way to write in comoving coordinates the fact that there 
is no physical correlation among modes of different wavelengths: a local observer can obtain the same result for a local experiment by 
starting with vanishing super-Hubble correlations. See discussion about this in [14], 
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In this paper, we will introduce two new trispectrum shapes which correspond to quartic operators of the form 
a 2 (dia) 2 and (dia) 2 (dja) 2 in the effective field theory of inflation. Following our normalization convention above, we 

(da ) 2 


define parameters g EE ° 


and g^l ] 


by: 
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(7) 


where A' = fci + fc 2 + fc 3 + fc 4 . In this paper, we will implement the optimal CMB estimator for four trispectra: 

9nli 9nl > ' T ' > > an d 9NL 1 ■ (The tnl trispectrum requires slightly different techniques for reasons that will be 
apparent later, so we have omitted it in this paper.) Searching for these four trispectra is analogous to searching for 
the standard bispectra f l ^ c L , an d Inl^- 2 

There is a basic computational problem which arises for computational operations with trispectra, for example 
applying an estimator to CMB maps, or computing a Fisher matrix. Naively, these operations have computational cost 
^(^max)> which is prohibitive for a large experiment like WMAP or Planck with ^ max ~ 10 3 . The same computational 
problem arises for the bispectrum, where it has been solved using the idea of factorizability [3, 5, 8, 20-22]. If 
a bispectrum can be represented as a sum of terms which satisfy a suitable factorizability condition (the precise 
condition is given in Eq. (38) below), then computational cost is dramatically reduced. A variety of general strategies 
have been proposed for making bispectrum data analysis computationally feasible (e.g. [8, 21-24]); while the details 
of these proposals are very different, they can all be viewed as different strategies for representing a bispectrum as a 
sum of factorizable terms. Analogously for the trispectrum, we will formulate a suitable definition of factorizability, 
show that it leads to dramatically reduced computational cost, and give a physically motivated, Feynman diagram 
based prescription for representing inflationary trispectra in factorizable form. This will allow us to analyze the local, 
tr 4 , & 2 (da) 2 , and (da) 4 trispectra. 

Among other things, factorizability means that we can do a Fisher matrix analysis of correlations between 
trispectra. We will show that there is one near-degeneracy among the four trispectra. To quantify this, the a 2 (da) 2 
trispectrum is 99.2% correlated with a suitably chosen linear combination of the a 4 and (da) 4 trispectra. Therefore, 

we will eliminate the parameter , and reduce our set of trispectra to three: g l E c L , g^L^ an< ^ 9nl? ■ 

We will construct trispectrum estimators and present details of analysis pipelines which are suitable for realistic 
experiments such as WMAP or Planck. We would like to emphasize three technical issues from the outset. 

First, the trispectrum estimator is potentially very sensitive to modeling errors in the two-point function due to 
slightly incorrect cosmological parameters, detector noise properties, or beams. Suppose the trispectrum is estimated 
assuming covariance matrix Co, but the true covariance is Ct rue = Co + AC. The trispectrum estimators we use will 
have the property that the resulting bias is parametrically C((AC) 2 ) rather than O(AC). This property turns out 
to be critical in practice. 

The second issue is that many technical tricks are necessary to reduce the number of Monte Carlo simulations 
in the trispectrum estimation pipeline to a reasonable level. We will find several situations where an “obvious” 
Monte Carlo procedure is slow, but there is an alternate Monte Carlo procedure which is faster (examples include 
Eqs. (66), (83), and (F2)). 

Third, gravitational leasing and other secondary effects (such as contamination by residual infrared sources) 
generate a nonzero trispectrum which must be subtracted. The lensing trispectrum has been measured in ACT [25, 26], 
SPT [27, 28], and Planck [29], with recent measurements approaching 40 ct! Although lensing is an interesting source 


2 Recall that the space of bispectra spanned by f^ L , /“|, h is equal, by a linear transformation, to the space generated by the cubic 
operators <r 3 and aider) 1 [6]. 
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of cosmological information, in this paper our focus will be on the primordial trispectrum, so we will treat lensing as 
a large contaminant whose bias must be subtracted when estimating other trispectrum shapes. 

We will conclude by performing an optimal analysis of WMAP data. We find the following constraints (all 95% 

CL): 

(-8.18 x 10 5 ) < g§ c L < (0.58 x 10 5 ) 

(-9.38 x 10 6 ) < g^ L < (2.98 x 10 6 ) (8) 

(-2.34 x 10 6 ) < g^ < (0.19 x 10 6 ) 

We find no evidence of primordial trispectra, and the error bars agree with Fisher matrix forecasts. 

II. MINI-REVIEW OF EFFECTIVE FIELD THEORIES OF SINGLE AND MULTIFIELD INFLATION 

In this section we briefly review the particle physics motivation for studying the trispectra that we analyze. We 
do this by using the effective held theory of inflation [30] and of multifield inflation [4]. 

We start from single held inflation. By assuming that inflation is an early phase of the universe characterized by 
a spontaneous breaking of time diffeomorplrisms, it is possible to construct a model independent Lagrangian for the 
fluctuations. Furthermore, in inflation we are interested in computing correlation functions at an energy scale around 
the Hubble scale during the early quasi de Sitter phase. Often, this energy scale is high enough to write the action in 
the so-called decoupling limit, where the Lagrangian takes a very simple form: 

S n = j - Mp\H (<9 m 7t) 2 + 2M| ^t 2 + 7t 3 - 7T ^-^ + (d^n) 2 (<%7r) 2 ^ 

-^-(87r 3 + 12 7r 2 (a /i 7 r ) 2 + -") + ^( 16 7r 4 -|- 3 27r 3 (^7 r ) 2 + •")+••• . (9) 

where • • ’ represents higher order terms, higher derivative terms, and slow-roll suppressed terms. Here spatial indexes 
i are contracted with the -tensor, while space-time indexes g, are contracted with the FRW metric . The field 
7 r represents the Goldstone boson of time translations. It is related to the curvature perturbation £ as 

( = —Htt + (higher-order terms). (10) 

The non-linear realization of time-diffeomorphisms forces the appearance of 7r into non-linear blocks. Simple inspection 
of the action shows that it is impossible to have a four-point function induced by operators of the form Tr 2 (d^Tr) 2 and 
(<9 M 7r) 2 (d u 7r) 2 that is the leading non-Gaussian signal: when these operators are turned on, there is always a cubic 
operator that induces a bispectrum with much higher signal-to-noise ratio [31]. At the level of the leading derivative 
operators, the only term that has a chance of producing a trispectrum without a bispectrum with a large signal-to- 
noise ratio is the operator 7T 4 . One should be careful about radiative corrections though. The non-linear realization 
of time-diffeomorphisms forces the presence of a quintic operator ■?r 3 (9 Ai 7r) 2 together with 7r 4 . One is naturally lead to 
wonder if this operator will induce, under radiative corrections, a cubic operator that dominates the signal. It turns 
out that the relative coefficients of 7T 4 and 7r 3 (9 Ai 7r) 2 are fixed by time-diffeomorphism invariance in such a way that, 
when the signal-to-noise in 7r 4 is large (gJf L 10 5 ), the radiative corrections induced by the quintic operator generate 
at most a cubic operator 7T 3 and 7r(i9 At 7r) 2 with an /ml ~ 1, and therefore subleading [31]. This can be interpreted 
as an approximate Zi symmetry of the inflaton. 3 We therefore conclude that it is possible to have a trispectrum 
induced by 7T 4 as the leading non-Gaussian signal. A constraint on this trispectrum can be directly mapped, in the 
context of single held inflation, into a constraint of the coefficient Mf of (9), as already done by the WMAP and the 
Planck experiments for the coefficients M| and Mg from analysis of /)[?)] 11 and /^ h ° s [6, 9, 34]. 

It is also possible to have higher derivative interactions leading to large non-Gaussianities directly in the form of a 
trispectrum from interactions with more than four overall derivatives, very schematically of the form (<9 2 7 t) 4 [7, 31, 35- 
38]. As for the case of the three-point function, where the same phenomenon appears, the signal can be made detectable 


3 The fact that the coefficient of it 4 is unrelated to the coefficients of the cubic terms had been already noticed in [32, 33] for a subclass of 
the models we consider with the effective field theory consisting of scalar field Lagrangians of the form P((d<f>) 2 , <j>). However, without 
the identification of a mechanism protecting the generation of cubic terms, it is hard to imagine why one should concentrate on the 
particular Lagrangian allowing for a large quartic operator and small cubic ones. 
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only by lowering enough the unitarity bound of the theory, and it is furthermore possible to have strong degeneracies 
with shapes with fewer derivatives. This makes the prospects of detection somewhat more unlikely. We leave the 
study of these shapes to future work. 

We now pass to multifield inflation. An effective field theory description of multifield inflation can be constructed 
after realizing that the predictions of multifield inflation largely do not depend on the background solution, with scalar 
fields developing possibly complicated trajectories in field space, but simply on the Lagrangian of the fluctuations. 
This Lagrangian can be simply constructed by coupling additional light degrees of freedom to the Goldstone boson 
of time-translations 7r. The main difference between single field and multifield inflation is that while in single field 
inflation the relationship between the Goldstone boson 7r and the curvature perturbation £ is fixed by the background 
cosmology as in Eq. (10), the same is not true for the effect of the additional inflationary fields on (. How much a 
given fluctuation of the additional fields contributes to the curvature perturbations depends on the whole trajectory 
of the fields from the time a mode crosses the horizon to reheating, and also on the details of the reheating epoch. 
However, the fact that these effects happen when all the modes of interest are outside of the horizon and gradients 
are therefore negligible (see Fig. 1) permits a crucial simplification [4]: the relationship between the fluctuations of 
additional scalar fields, oy, and £, must be local in space, and since fluctuations are quasi-Gaussian, the relationship 
can be Taylor expanded. We are therefore led to 


C( x ) 


—H tt(x) + 





I ( 

2 ! \ dai da j 


a I {x)(jj{x) 

o 


( 11 ) 


where ( d n C > /ai 1 .. .9oy n )o are numbers representing the Taylor expansion of the generic relation, local in real space, 
between £ and ay, ((x) = f(cr/(x)), around the point oy = 0. This relationship, developed in [4], generalizes in a 
non-trivial way the so-called SN formalism of [39-42]. 

At this point the problem of writing the effective field theory of multifield inflation is reduced to writing a 
Lagrangian for the additional light fields present during inflation, possibly coupled to the Goldstone boson n. Clearly, 
there is some freedom in what kind of fields we decide to include. In this paper we will content ourselves with the fields 
studied in [4], although it would be interesting to study additional possibilities. There, the additional fields that were 
included were scalar fields oy which generate curvature perturbations after horizon crossing, and not just through 
their effect on n. 4 To ensure that quantum corrections are small and do not make the mass of these additional 
fields large, it was postulated that these fields were either the Goldstone bosons of some global symmetry, abelian or 
non-abelian, or they were protected by an approximate supersymmetry. 5 

These different mechanisms that protect the lightness of the additional scalar fields from quantum corrections 
can lead to distinguishable signals that, if detected, might allow us to infer the mechanism protecting the lightness of 
these fields, as described in detail in [4], to which we refer for details. Unfortunately, these mechanism-specific signals 
appear only as either subleading signals that have lower signal-to-noise ratio than other ones that should be detected 
first, or as signals appearing in correlation functions involving isocurvature fluctuations. Unfortunately, the leading 
signal is not able to distinguish among the various mechanisms protecting the lightness of the additional fields. Since 
in this paper we will restrict to adiabatic fluctuations, and since we are just trying to detect the leading signal, we can 
neglect all these distinctions, and we can focus on the following Lagrangian, which is common to all three mechanisms 
above (Abelian Goldstone bosons, non-Abelian Goldstone bosons, and supersymmetry): 


S a = d 4 x y/-g 


(<V) 2 + + Ji° r 2 (<9 iC r) 2 + ^(^fT) 2 {djdf + jgd 4 + 


( 12 ) 


This Lagrangian reproduces the relevant features that are contained in the models in [4]. First, notice that we did not 
write any cubic terms that would give rise to a bispectrum signature. These can be suppressed with some symmetry, 
such as for example a Z 2 symmetry a —► —a or by imposing a Lorentz invariance in the theory, as described in [4]. 
The quartic couplings are suppressed by scales A 12 , 3 , the smallest of which represents the unitarity bound of the 
theory. The first three interactions are compatible with a shift symmetry of the a field, and their coefficients are all 
independent. This means that they can generate observable templates associated to the operators a 4 , a 2 (did) 2 and 


4 This means that our discussion does not include models of the class of the so-called quasi-single field inflation [43, 44] 

5 Supersymmetry is broken during inflation minimally only by the Hubble scale H, which means that radiative corrections to the super¬ 
potential vanish above the scale. For weakly coupled theories, where loops are suppressed by a weak coupling parameter, this makes 
radiative corrections perturbatively small [4], 
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(di<r) 2 (dja) 2 . Note that the operator a 4 generates the same trispectrum as the operator jr 4 considered previously in 
the single field case. The operator a 4 is present in the case the a fields are supersymmetric, or when the the symmetry 
that the Goldstone bosons cr’s non-linearly realize is softly broken. This operator gives rise to a bispectrum of the 
local form which, again, cannot be generated in the single field case. The signal produced by the a 4 term in the 
Lagrangian can give rise to a much larger signal than the one associated to an f l fi c L of order unity. On top of these 
contributions, there are the ones associated to the non-linear relation between Q and cr’s in Eq. (11). They give rise 
to bispectra and trispectra of local type. 

Finally, we notice that one can enforce a particular symmetry in the case of multifield inflation, where non- 
Gaussianities are generated in a theory where Lorentz invariance in the multifield sector is left unbroken [4]. In this 
case, only two operators survive, (9 /i <j) 2 (9 1 y(T) 2 and a 4 . This symmetry can be explicitly checked to be mapped into 
a conformal symmetry of the three-dimensional templates [45, 46]. There are finally additional trispectra, as for 
example a 2 (da) 2 , associated to soft breaking of the some internal symmetries or to supersymmetry [4] whose analysis 
we defer to a subsequent work. Additional interesting studies, including some very early ones, for the inflationary 
trispectrum, both in single field and multifield inflation, can be found in [47 77]. 

We conclude this section by relating parameters in the above Lagrangians to the coefficients defined in the 
introduction. For the case of the single field Lagrangian (Eq. (9)), a short calculation using the in-in formalism [12] 
shows: 


J* = ^Mf 3 
9nl 288 H 4 c s ' 


(13) 


For the multifield Lagrangian (Eq. (12)), we find: 


9nlA( 


25 H 4 
768 Af ’ 


0 - 2 (^) 2 4 , _ _ 
9nl — 


325 H 4 
6912 A 4 ’ 


a O-Y V _ 

9nl — 


2575 H 4 
20736 Af ‘ 


(14) 


■ 4 * 2 / o \ 2 / a \ 4 

Notice that, as expected, (g^ L A^), (g^^ a A^), and (g ( N ^ A^) scale as ( H/A ) 4 , being generated by dimension 
eight operators. The Lorentz invariant trispectrum generated by the operator (d tl a) 2 (d u a) 2 is obtained by setting 
Af = Af = — 2Af. 

Finally, for the local trispectrum, we get either: 


9n c lA c 


_50^ 

27 A 4 e 



(15) 


in the case where the trispectrum is generated by a a 4 interaction in the multifield action (Eq. (12)), or 

25 H e ( d\ \ (cK_\ (dC_\ ( <K \ 

54 A 3 V da T dajda K J 0 \da r J 0 \ Daj ) 0 \ da K ) 0 


9nl = 


(16) 


in the case where the local trispectrum is generated by the conversion mechanism in Eq. (11). 


III. THE CMB TRISPECTRUM AND ITS OPTIMAL ESTIMATOR 


A. Toy model 


Before diving into the complications of the CMB, it may be illuminating to construct the optimal trispectrum 
estimator for the following toy model. Let aq, • • • , Xjy be independent identically distributed random variables whose 
distribution is nearly Gaussian, with mean zero, known variance cr 2 , and small kurtosis n <C cr 4 which we would like 
to estimate. Thus the two-point and four-point functions are: 

(X'j Xj ) — (T Sij \Xj X'j Xy- X-i) — fT (SijSkl T A A/ Ay I, ) ~\~ KSijSjk^kl (17) 

It may seem natural to estimate n using the simple estimator: 


1 

N 



-3(J 4 


(18) 
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FIG. 1: Representation of a typical multifield potential. Modes of interest for observation cross the horizon about sixty e- 
foldings before the end of inflation. Therefore, effects coming from the evolution of the fields after horizon crossing can be 
treated locally in real space. The effective theory is more general than this example, as it does not assume that the inflaton is 
a scalar field. This example is however interesting in helping in visualizing the different scales in the problem. 


However, this estimator is suboptimal! The optimal (minimum variance) estimator turns out to be 

^ = jf jA ^j - 6tj2 jf x ^j + 3(j4 ( 19 ) 

A short calculation using Wick’s theorem shows that 

^ 96er 8 ^ 24tr 8 

Var(At na ive) = Var(/s 0 pt) = — (20) 

so the naive estimator K n aive is significantly suboptimal. 

In addition to having lower variance, the optimal estimator has another property which is crucial in practice. 
Suppose that the variance ct 2 is not precisely known in advance, but has been estimated with some error A a 2 = 
a est ~ a true- Let us compute the “two-point bias” in our estimate of k due to the incorrectly estimated variance. A 
short calculation gives: 

(Knaive) =K~ 6ct 2 (A<7 2 ) + 0(A(T 2 ) 2 («opt) = K + 3(Aer 2 ) 2 (21) 

In other words, the optimal estimator is parametrically more robust (by one power of Act 2 ) to errors in our estimates of 
the two-point function. This extra robustness will be important when we generalize to the CMB, where beams, noise, 
and residual foregrounds all contribute to the two-point function and are notoriously difficult to estimate precisely. 


B. CMB estimator 


Let us first establish some notation. We denote the angular four-point function or trispectrum of the CMB by: 


r ^'m 1 rnlrn 3 m i — a fem 2 8 <3"i3 fl 4m 4 ) c 


— (^i772i ^t 2 772 2 ) (^37723 ^47724 ) 

-( a C 7711^37713) ( ^-£2 7712 ^^4 7714 ) (^2^2 ^3^3 ) 


= (a*! mi ^t 2 777 2 6^37723 6^47724 ) 


(_l) m i +m3 Ce 1 Ce 3 Se ie2 Se 3 e i 5 rni - rn3 5 m2 ,-rn i + (2 perm.) 


( 22 ) 


The trispectrum is invariant under the 4! permutations of its indices (^,TOj), and satisfies the reality 

condition 


_ /_1 \ 772!+7722+7723+7224/^1^2^3 A 

- L m 1 m 2 m 3 m i 1 J (-m 1 )(-m 2 )(-m 3 )(-m 4 ) 


( 23 ) 









We will only consider CMB trispectra which are rotationally invariant, i.e. is unchanged if a common 

rotation is applied to all four pairs of indices (£j,TOj). This means that the trispectrum has fewer degrees of freedom 
than the index notation would suggest. (It is possible to devise alternate notation which makes this more 

explicit [78], but we will not do so in this paper.) 

Now consider a CMB experiment in which the instrumental response is linear and the noise is Gaussian. The 
observed CMB a^ m is a sum of signal and noise contributions: we have ae m = se. m + ne m , where se m is the true CMB 
and ri( m is Gaussian noise. Let Ce imit e 2Jn , 2 = (ae. 1 mi a t 2 m 2 ) be the total (signal + noise) covariance of the observed 
CMB. Note that although the signal contribution to C will be diagonal in (£, m), the noise contribution will generally 
be nondiagonal. 

Armed with the above notation, the optimal trispectrum estimator can be written in the following general 
form [79]: 
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where the normalizing constant F is given by 
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Let us now interpret the terms in the optimal estimator above. The first (quartic) term on the RHS of Eq. (24) is a 
sum over 4-tuples (t), nrii) in which each 4-tuple is weighted by the template signal 7^^^, and inversely weighted 
by the total covariance C. This type of weighting appears in a variety of optimal CMB estimators, for example optimal 
estimators for the power spectrum or bispectrum. The second (quadratic) and third (constant) terms in Eq. (24) 
parallel the terms found previously for the toy model in Eq. (19). We note that a similar structure occurs in the 
optimal estimator for the three-point function, where there is a one-point term in addition to the leading three-point 
term [5]. 

As in the toy model, the additional terms in Eq. (24) reduce the variance, and also make the estimator more 
robust to errors in the two-point function. To make this last point more precise, if the total covariance C is estimated 
incorrectly with nonzero error AC, then it is easy to show that the bias in the estimator is parametrically C((AC) 2 ) 
rather than C(AC). This property is critical in practice. If we used an estimator whose bias is parametrically O(AC), 
we would need to model beams, noise bias, residual foregrounds, etc. with fractional accuracy 1/fmax ~ 0.1%. This 
level of accuracy is extremely difficult to achieve for an experiment as complex as Planck. On the other hand, with an 
estimator whose bias is parametrically (D(AC 2 ), the required fractional accuracy is ss 1/^max or a few percent, which 
is easily achieved in practice. 

A short calculation shows that the variance of the optimal estimator is 

Var(£) = (26) 

i.e. F determines both the normalization of the estimator and its variance. 


C. The Q-symbol 


We now define notation which will be used ubiquitously throughout the paper. Given a CMB trispectrum 


T 


777.1771277137714 


and CMB realization a^ m , we define the “Q-symbol” Qr[a :] by: 




1 ^2^3^4^ 
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(27) 


tiVUi 


The reality condition (23) for the trispectrum, together with the reality condition a\ m = (—l) m a^(_ m ), implies that 
Qt[o] is real. Note that a similar notation T[a] was defined for the bispectrum in [21]. 



9 


This notation is useful since most of the machinery in this paper can be written purely in terms of the Q-symbol. 
Therefore, our machinery applies to a trispectrum if a fast algorithm exists for evaluating its Q-symbol. For example, 
the optimal estimator from the previous section can be written as the following Monte Carlo average: 

S[a] = (Q[C , - 1 a,C'- 1 a,C , - 1 o,C" 1 a] - 6 ^Q[C'~ 1 a,C'- 1 a,C , - 1 6 ,C'- 1 6 ]^ +^Q[C'- 1 6 ,C , - 1 6 ,C'- 1 6 ,C , - 1 6 ]^ ) (28) 

where (•)(, denotes an average over Gaussian random realizations b with covariance matrix C. In §VI we will develop 
fast algorithms for computing Fisher matrices, and in §IX we will present detailed data analysis pipelines, under the 
assumption that Qt[o] is computable. We will also present an algorithm for simulating a non-Gaussian map with 
specified trispectrum, although we defer this to Appendix B since it is somewhat peripheral to our goal of analyzing 
WMAP data. 

We define the gradient de m QT[a] by: 


dimQrla] 


dQr [a] 

da tm 


E rpitxi^iz * * * 

mmim2'ni3 u ximi Uj i2‘cri2 ^3 m 3 


(29) 


The object de m QT[a] is a harmonic-space map, as the index notation suggests. It transforms covariantly under 
rotations, in the sense that dQ[R - a] = R ■ 9Q[a], where (R ■ a) denotes the action of a rotation R G £0(3) on a 
harmonic-space map ae m - 

We will sometimes omit the subscript T, and simply write Q[a] or de m Q[a], if the trispectrum is understood. 
It will also be convenient to define the following generalizations of Q and dQ, which are functions of four CMB 
realizations (a, b, c, d) and three realizations (a, b, c) respectively: 
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(30) 


D. An alternate approach? 

Let us temporarily return to the toy model from III A. We construct an interesting near-optimal trispectrum 
estimator as follows. Suppose we use the naive estimator K na i ve , but estimate the variance a 2 internally from data, 
rather than assuming a priori knowledge of a 2 . In other words, consider the pure four-point estimator: 



It is not hard to show that the variance is 

24 a 8 ( 3 \ 

VarM = (l + (32) 

Comparing with the result for the optimal estimator (Eq. (19)) we see that K a it is near-optimal, in the sense that its 
variance agrees with the optimal estimator to leading order in 1/N. This estimator also has the property that its 
two-point bias (due to incorrectly estimated a 2 ) is zero! The estimator K a i t is only sensitive to the four-point signal 
k, with no dependence on the variance a 2 . We note that this statement does assume that the covariance matrix of 
the Xi is proportional to the identity matrix, and there is no estimator which has zero bias for an arbitrary covariance 
matrix C t j. Nevertheless it is interesting that a zero-bias estimator exists for a restricted form of covariance matrix, 
and natural to ask whether this generalizes to the CMB context. 

Ideally we would like to construct a CMB trispectrum estimator which is unbiased if either (1) the isotropic 
signal power spectrum Ce, or (2) the noise covariance is estimated incorrectly. We speculate that it is possible to 
give a general construction of such an estimator. Noise bias can be eliminated by dividing the data into subsets 
with uncorrelated noise, making maps (ai)^ m , ( 02 )^, • • • from one subset at a time, and allowing only “cross” terms 
Q{ai,a,j,ak,ai\ with ( i,j,k,l ) distinct. Signal bias can be eliminated by estimating Ce directly from the data and 
subtracting a term which is quadratic in the estimated Ce s, by analogy with the toy model case above. Such an 
estimator would be very useful e.g. for the gravitational lensing four-point function, where a variety of noise bias 
cancelling schemes have been proposed [25, 80-82]. However, we defer this topic for future work. 
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IV. 3D ->• 2D PROJECTION 


We will often be interested in “primordial” trispectra, that is, CMB trispectra which arise by linearly evolving a 
physically motivated four-point function in the 3D adiabatic initial curvature (. The (-trispectrum is defined by: 


(CkiCk 2 Ck3Ck 4 )c — ((ki(k 2 (k 3 (k 4 ) (Cki (k 2 ) (Ck 3 (k 4 ) (Ck 4 Ck 3 ) (Ck 2 Ck 4 ) (Ck 4 Ck 4 ) ((k 2 Ck 3 ) 


— (Cki Ck 2 Ck 3 Ck 4 ) 


P(ki)P(k 3 )(2ir) b 6 3 (k 1 + k 2 )<r(k 3 + k 4 ) + (2 perm.) 


(33) 


We will also use the “primed” notation (CkiCk 2 Ck 3 Ck 4 ) / to denote the (-trispectrum without its momentum-conserving 
delta function, i.e. 

(Ck 1 Ck 2 Ck 3 Ck 4 >c = (Ck 1 Ck 2 Ck 3 Ck 4 ) / (27t) 3 5 3 (J2 ki) (34) 

We can project a (-trispectrum to an angular CMB trispectrum as follows. Recall that the CMB multipoles ai m are 
related to the initial curvature ( by: 

/ d 3 k ~ 

j^A e (k)((k)Y e * m (k) (35) 

where the transfer function A e(k) defined by this equation can be computed numerically using CAMB [83]. The 
following general trispectrum projection formula follows immediately: 


r T'£ 1 ^ 2 ^ 3^4 
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Plugging this into the definition (27) of the Q-symbol, we get the following expression for Qt [a]: 

4 / 


Qt[o] = 


d 3 ki d 3 k 2 d 3 k 3 d 3 k 4 
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(36) 


(37) 


Since the functional form of Qt[o] uniquely determines this expression for Qt is equivalent to the 

projection formula (36) for T. In fact, throughout the paper we will often find it more convenient to specify a 
trispectrum T by giving a formula for Qt[o] than by an explicit expression for 


V. FACTORIZ ABILITY 

Evaluating Qt[o\ directly from its harmonic-space definition (27) is computationally intractable, since the number 
of terms in the sum is G(^ ax ), where £ max = 0(1O 3 ) for WMAP or Planck. An analogous computational problem 
arises in analysis of the CMB bispectrum, where it has been solved using the idea of finding a factorizable representation 
of the bispectrum. We start by briefly reviewing factorizability for the bispectrum, in notation which will set the 
stage for the trispectrum discussion to follow. 


A. Review of factorizability for the bispectrum 


A CMB three-point function is said to be factorizable if it is a sum of terms of the form: 


Affa. 


E A l B L c L £™ 3 m 3 + (s p erm -) 


(38) 
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where are -/Vf ac t-by-f max real-valued matrices, and i s the Gaunt symbol, defined by: 


Qt i « 2 £ 3 
777.1 777.27713 


= J d 2 n Y timi (n) Y<? 2m2 (n) Y^ m3 (n) 

( 2 £i + l)(2l 2 + l)(2l 3 + 1) / j 1 4 4 
47t \ 0 0 0 


( 39 ) 
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The significance of the factorizability condition is that it makes the bispectrum estimator computationally feasible. 
First recall [21] that the bispectrum estimator can be written in terms of the T-symbol, defined by: 


T[a} 


6 ^ 

Zirrii 


{at imi a^ 2 m2 ) ^imi^2™2%ni3 


(40) 


This harmonic-space sum is computationally infeasible, but if the bispectrum satisfies the factorizability condition (38) 
then T[a} can be rewritten: 




4l^i at^rnx T), mi 




' C't 3 a t 3 Tn 3 Yt 3 rn 3 
C 3 m 3 



(41) 


which is straightforward to evaluate efficiently using fast spherical transforms. For this reason, finding a factorizable 
representation for a given bispectrum is the key to making data analysis practically feasible. 

Many CMB bispectra of interest arise from 3D—>2D projection of a ((-bispectrum (CkiCk 2 Ck 3 )- There is also a 
useful notion of factorizability for a ((-bispectrum as follows. A <(-bispectrum (CkiCk 2 Ck 3 ) is said to be factorizable if: 


N fact 

(CkiCkoCka)' = g X! a /( fc i)/3/(fc2)7/(fe) + (5 perm.) 
0 1—1 


(42) 


where ai(k), fii{k), y(/c) are arbitrary functions. A general 3D—>-2D projection formula for bispectra, very similar to 
the one given for trispectra in the last section, shows that the corresponding CMB bispectrum is: 
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(43) 


This equation shows that if we approximate the r integral by a finite sum, we obtain a CMB bispectrum which is 
factorizable in the sense defined by Eq. (38). Thus a £-bispectrum which is factorizable gives rise to a CMB bispectrum 
which is also factorizable, although the number of terms will increase by a large factor, since many points will be 
needed to approximate the r-integral. 

A variety of general schemes have been proposed in the literature for making bispectrum data analysis computa¬ 
tionally feasible (e.g. [8, 21-24]). These schemes can all be viewed as different proposals for representing a bispectrum 
as a sum of factorizable terms. Some methods operate directly on the CMB bispectrum, for example the binned esti¬ 
mator in [22] uses bandpowers in £ to define basis functions Aj, Bj ,(7/. Other methods operate on the ((-bispectrum 
before 3D—»2D projection, for example by expanding the ((-bispectrum in a set of orthogonal basis functions [8, 24], 
Finally, in some cases it is possible to find an approximate factorizable representation as a pure ansatz. The canonical 
example is the equilateral bispectrum [5], where the factorizable template: 

(fci -I- k 2 — k 3 ){k 2 + k 3 — ki){k 3 + k 3 - k 2 ) 

(CkAksCka) - -p 7.3 7,3- ( 44 ) 

is 99% correlated to the exact bispectrum of the operator k 3 . 

In this paper, we will concentrate on a “physical” approach to factorizability which generalizes nicely to the 
trispectrum and also provides some physical interpretation. The idea is that the Feynman diagram which one evaluates 
to compute a given bispectrum automatically supplies a factorizable representation. To illustrate this idea by example, 
consider the 7r 3 bispectrum: 


(Cki Ck 2 Ck 3 ) CK 


1 

kik 2 k 3 {ki + k 2 + k 3 ) 3 


(45) 


This bispectrum does not appear to be factorizable. However, let us go back to the Feynman diagram which produced 
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Here, te is Wick-rotated conformal time, which we take throughout this paper to run from te = — oo to 0. We see 
that the integrand in Eq. (46) is factorizable in k\,k 2 ,k^. This is not a coincidence: it arises from combinatorics of 
the Feynman diagram, since each factor corresponds to one external line. If we do not evaluate the te integral, but 
instead approximate it by a finite sum of te values, then we will obtain a factorizable C-bispectrum. As shown in [21], 
the number of terms in the sum can be kept manageable by sampling the integral with equal spacing in log \te\- This 
trick is general and shows that any CMB bispectrum which arises from a cubic diagram of the combinatorial type 
shown in Eq. (46) is factorizable, although the number of terms in the CMB bispectrum may be large, since we get 
one term for every sampling point needed to do the (te,t) double integral. 


B. Factorizability for the trispectrum 

We would like to define a notion of factorizability for the trispectrum , in order to make data analysis of primordial 
trispectra computationally feasible. 

We have just seen that in the bispectrum case, the notion of factorizability derives from the combinatorics of the 
Feynman diagram. In the trispectrum case, the trispectrum can come from either a “contact” diagram with a quartic 
vertex, or an “exchange” diagram with two cubic vertices: 



Accordingly, we will define two different factorizability conditions for the trispectrum, “contact factorizability” and 
“exchange factorizability”. We will give the precise definitions shortly, but there is one feature which can be anticipated 
in advance. One might think (by analogy with the bispectrum case) that a contact diagram always gives rise to a 
trispectrum of the form 


(CkiCi^CkaCkJ' = [ dr E a(k 1 )0(k 2 )'y(k 3 )S(k4) 

J —OO 


(49) 


in which the integrand is a factorizable function of fc, = k, |. However, this is not fully general: if the quartic operator 
contains spatial derivatives, then there will be additional factors (k, • k,), as can be seen by inspection of Eqs. (6), (7) 
in the introduction. In the bispectrum case, we would be able to get rid of a factor such as (ki • k 2 ) by using the 
momentum-conserving delta function <5 3 (ki + k 2 + k 3 ) to write (ki • k 2 ) = (/cf — k\ — k 2 )/2 and reduce to the case in 
which the integrand is factorizable in ki = k, |. In the trispectrum case, there is no analogous way to eliminate factors 
(k ( ■ kj). The consequence is that our definition of factorizability will contain non-scalar quantities, i.e. 2D fields with 
spin s > 0. This is a significant complication compared to the bispectrum case. For reference, the mathematics of 
spin-s fields is briefly reviewed in Appendix A. 


C. Contact factorizability 


In this section we give the formal definition of contact factorizability. We define an angular trispectrum 
TmimZmami t 0 be “contact factorizable” if: 


rp£ 1 ^ 2 ^ 3^4 
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(50) 
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where ( s Wm) denotes a spin-s spherical harmonic (Appendix A), ai , /3j, 7 /, <5/ are integer spins, and Aj, Bj,C £, Dg 
are -/Vf act -by-£ max matrices. We will assume that ay + /3j + 7 / + Si = 0 for each I so that the integrand in Eq. (50) 
has total spin zero. 

Equivalently, we can define a contact factorizable trispectrum by the functional form of Qt [a]. 

Qt[o-] = 'y ] I d n \ ^ (01 ^ £imi ( n )) I ( 'y ' Be 2 ae 2 m 2 {/ 3 I Ye 2 rn 2 (n))'\ 

0 /=i J \e imi ) \e 2 m 2 ) 

x ( y ' ^( 3 a i3m3{'YiYe 3 m3 ( n )) j ( y ' {s I Y£ iTni (n)) j + c.c. (51) 

\< 3 m 3 / \i4m4 / 

This definition of factorizability is useful because it is specific enough that there is a fast algorithm for evaluating Q[a] 
(by straightforward use of Eq.(51) with fast spherical transforms), yet general enough that many physically interesting 
trispectra are contact factorizable. 

We illustrate this by example, by showing that the operator a 2 {da) 2 from the previous section is contact factor¬ 
izable. We start from the “unintegrated” form of the trispectrum in the first line of Eq. ( 6 ): 

(Ck 1 Ck 2 Ck 3 Ck 4 )c = - 1 t^9nl > ' 7) A c J dT E t e fc 3 fc 3 fc3 fr 4 (* ~ fc Ws)( 1 - k 2 T E )(k 1 ■ k 2 )(27r) 3 <5 3 (E k i) + 5 perm. 

(52) 

We plug this into the 3D—>2D projection formula (37) to obtain an expression for Q[a]. We replace the momentum- 
conserving delta function (27 t) 3 (5 3 (J] k,) by f d 3 r exp(ij]k i • r), and then replace the dot product (ki • k 2 ) by an 
appropriately placed pair of derivatives with respect to r. We obtain: 


„ r . 3456 .3 /* 3 , 2 j3 / d 
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Following a standard trick [20], the next step is to Rayleigh expand each exponential as exp(*k; • r) = 
i e je(ki r )Ye m (?)Yl m {k) and do the angular parts of the fc-integrals, obtaining: 
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Next we split the dot product of gradients JA (>/ fjj. , as a sum of two terms: a term in which both derivatives act in 
the radial direction, and a term containing angular derivatives. More formally, we can write: 


dmdm = famy ± {dfrm 

4- dr, dr, \ dr J r 21 J ’ { J ’ 
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where /(r) is any scalar-valued function and 6 is the spin-raising operator (see Appendix A). Plugging this in we get: 

Q[a) = ^Satl' 9 ' T) J dr E J dr J d 2 nr|r 2 m(te, r)a^ m Y^ m (n)^ 

Y Uf ( te , r)ae' m > (iY e > m t (n)) | | (56) 


H ^'(tb.Oo/ 'm' Y e 'm' (n) I H"~ 

L \t'm' / 


£'m' 


where we have used the identity = \J t(i + l)(iW m ) and defined 

Ht{T E ,r) = J 2k ^ k e kTB k 5 / 4 P i (k) 3/4 A e (k)je(kr) 

vt{r E ,r) = J _ kT E )e kTE k 1,A P^{kf ,A \ t {k)j' t {kr) 

ue(r E ,r) = YEYlJl j ^-^(1 - kT E )e kTE k~ 3/4 P c (k) 3/4 A e (k)je(kr) (57) 

As anticipated, Q[a] is of contact factorizable form (50) after replacing the ( r E ,r ) double integral by a finite sum. 
This calculation generalizes to show that trispectra of the following types are contact factorizable: 

1. Any ((-trispectrum which is a product of functions f(k, : ) and any number of dot products of the form (k^ • k^). 
The local trispectrum in Eq. (1) is an example. In this case we obtain a factorizable representation by applying 
the projection formula (37) and approximating the r integral by a finite sum. 

2. Any trispectrum which arises from a local quartic operator in the inflationary action. We reduce to the previous 
case by writing the ((-trispectrum as a time integral, and replacing the (r E ,r) double integral by a finite sum. 

In Appendix C, we work this out explicitly for the local, (a 4 ), and (da) 4 trispectra. The resulting formulas 
(Eqs. (C2), (C5), (Cll)) are useful for reference, and for numerical evaluation of trispectra in our analysis pipelines. In 
this appendix, we also present our scheme for generalizing from the scale invariant case (assumed above for simplicity) 
to the case of a power-law spectrum P$(k) = A^k Us ~ 4 . 


D. Exchange factorizability 


As mentioned previously, there is another notion of factorizability for the trispectrum, “exchange factorizability”, 
which arises from exchange of a light particle during inflation, and also leads to a computationally fast form for Q[a\. 
It would be interesting to explore the phenomenology and data analysis of exchange trispectra. For example, quasi¬ 
single field inflation [35, 44, 84, 85] should generate interesting continuous families of trispectra, as the mass of the 
exchanged particle and the type of cubic operator are varied. However in this paper, we will restrict attention to 
contact factorizable trispectra, leaving this generalization for future work. The main technical obstacle in generalizing 
the machinery of this paper to the exchange factorizable case is developing an analogue of the optimization algorithm 
in §VII. In this section, we simply give the definition of exchange factorizability and a few examples. 

A CMB trispectrum is said to be exchange factorizable if the Q-symbol is given by: 


Qt[o\ = j^YYY M e J ( f d 2 n( ai+ 0 I Y ern (n)y ( Y ) ( Y B L a ^m 2 (Pi Y e 2 m 2 (n ))) ) 

V / Vw, “ JJ 

X ( f d 2 n' y j+Sj Y( m (n')y ( Y c l a e 3 m 3 ('yj Y e 3 m 3 (n')) \ ( Y J J + c.c. (58) 

\ J \e 3 rn 3 J \< 4 m4 // 


An exchange trispectrum is parametrized by integer spins ay, /?/, jj, Sj and coefficient arrays Ad\ J , Aj, Bj,, Cf , 
Dj , where I = 1, • • • ,Ni and J = 1, • • • , 7V 2 . Note that contact factorizability can be viewed as the special case of 
exchange factorizability where N\ = N 2 and = 6jj (with no £ dependence), so that the outer f-sum gives a delta 
function 5 2 (n — n'). It is easy to see from the definition (58) that Qt[o\ can be computed efficiently (more precisely, 
with cost ©(Wf'max + + ^l^^Lx)) by an appropriate sequence of fast spherical harmonic transforms. 
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The canonical example of an exchange trispectrum is the rjvL-trispectrum, defined previously in Eq. (3). To 
show that it is exchange factorizable, we first rewrite the (-trispectrum as: 

(CkiCksCkaCkJ = t nl j ^^P c (q)P c (k 2 )Pc{k A )(2TT) 6 S 3 {k 1 + k 2 + q)<5 3 (k 3 + k 4 - q) + (11 perm.) (59) 

Following the calculation in the last section, we plug into the 3D—>2D projection formula (37), Rayleigh expand both 
delta functions, and do the angular q and fc-integrals. When the dust settles we get: 
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Using the standard notation [3, 86 ]: 
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we can write this in the form: 


Qt[o] = f r 2 dr f r l2 dr' Y F Aa r') 

J J 

X / d 2 n Y e * m { n) Y a ei( r ) a ei mi Ye imi (f) Y Pe 2 ( r ) a e 2 m 2 Ye 2 m 2 (r) 

\ J Wimi / \i 2 m 2 J J 

X (j d 2 n' F/ m (n') f Y a e 3 (r')ae 3 m 3 Y e3m3 (r') \ f Y PeA r ') a e*m i Ye i m i (*') 


V377I3 


^€47714 


(60) 


(61) 


(62) 


Comparing this expression with the definition (58), we see that the t^vl trispectrum is exchange factorizable. This 
calculation generalizes to show that trispectra of the following types are exchange factorizable: 

1. Any (-trispectrum which is a product of functions f(ki), any number of dot products (kg • k 7 ), and one factor 
of the form /(|k 4 + k 2 |). 

2. Any (-trispectrum generated by an “exchange” diagram of the type shown in the right side of Eq. (48). 

Although our definition of exchange factorizability was constructed with inflationary trispectra in mind, there are 
also interesting non-primordial examples, for example gravitational lensing. The lensing trispectrum T^m 2 rn 3 mi can 
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found in Eq. (76) of [78]. Plugging this into the definition (27) of Q[a], we get the following expression for Qt[o]- 

Qr[a] = l E E 


s=±l s'=± 1 im 


^ | f (s^ra) | ^ ^ E \J£2(^2 + l)CE a ^ 2 m 2 2^2) J J 

V*' V^imi / V 2 m 2 // 

x ( [(s'Y (m y ( Y, ^^3 (E ^4(4 + i)c^a /4m4 ( s -y* 4m4 ) J J 

V \^3^3 / \^4^4 / / 


(63) 


Comparing with the definition (58), we see that the CMB lensing trispectrum is exchange factorizable. In this case, 
there is no obstacle to applying the machinery in this paper (since there are only a few terms in the trispectrum, we do 
not need an optimization algorithm). The pipelines we will develop in §IX could be used to give an optimal analysis of 
CMB lensing. Such an analysis would be qualitatively similar to other lens reconstruction analyses (e.g. [28, 29, 87]) 
but different in its approach to minimizing bias due to errors in modeling the two-point function. 


VI. FISHER MATRIX ALGORITHMS 

Consider an ideal CMB experiment with full sky coverage and isotropic noise. Such an experiment is completely 
specified by its noise power spectrum N(. Given angular trispectra T), • • • , Tjv, the iV-by-iV Fisher matrix is defined 

by 


F = 
1 w 


-Y 

4! ^ 


'v vmim 2 m3m4 \-*-3 /mim 2 77137714 


(Cti + N^)(Ci 2 + Nt 2 )(Ct, 3 + Ne 3 )(Ce 4 + N( 4 ) 


(64) 


and is interpreted as follows. If the CMB trispectrum is assumed to be a linear combination T = ^NL^i of the 
trispectra T il and the coefficients g' NL are jointly estimated using optimal estimators, then the estimator covariance 
is the inverse Fisher matrix: 

g °v(9nl,9nl) = ( F_1 )*i (65) 

The Fisher matrix is a powerful tool for forecasting and analysis of parameter degeneracies. 6 It will also play a central 
role in the trispectrum optimization algorithm which we will give in §VII. 

Computing the Fisher matrix directly from the definition (64) has computational cost 0(^m ax ) and is usually 
computationally prohibitive. In this section we will construct fast algorithms. 


A. Monte Carlo Fisher matrix algorithm 


A very simple fast algorithm for estimating the Fisher matrix is to use the following Monte Carlo procedure: 


F(T, T') 



{de, m Q T [a\)* [d^ m Q T ' [a]) 
C e + Nf 


( 66 ) 


where (•)„ denotes an expectation value over Gaussian random fields a^ m with power spectrum 1 /{Ci + Nf) (not 
power spectrum Ct + Np). The Monte Carlo error on the Fisher matrix is proportional to l/y/N mc , where N mc is 
the number of random realizations. In practice we find that the proportionality coefficient is very favorable; even one 
Monte Carlo realization is enough to approximate the Fisher matrix to «10% percent for the local trispectrum, or a 
few percent for the trispectra generated by one of the quartic operators a 4 , & 2 (dia) 2 , or (dicr) 2 (dj(j) 2 . 

The Monte Carlo algorithm in Eq. (66) may appear to be more complicated than necessary, since one can give a 
simpler Monte Carlo algorithm by simply estimating the variance of the all-sky optimal estimator for the trispectrum 


6 Of course, the assumptions of full sky coverage and isotropic noise will not be satisfied for a real experiment, but it is usually a good 
approximation to approximate the noise as isotropic, and account for sky coverage by scaling F —¥ / s |, v F. 
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T. However, the fractional Monte Carlo error of this simpler algorithm would by the “standard” y/2/N mc , so it 
takes many random realizations to obtain a useful estimate of the Fisher matrix. For this reason we always use the 
algorithm (66) to estimate the Fisher matrix by Monte Carlo. 

This is our first example of a phenomenon which will recur throughout the paper: there is an “obvious” Monte 
Carlo scheme which requires a large number of Monte Carlos, and an alternate scheme which is significantly faster. 
This phenomenon also occurs in the bispectrum context (e.g. Fig. 6 of [21]), where it was referred to as “fast MC”. 
We will see more examples shortly. 


B. Exact Fisher matrix algorithm for contact factorizable trispectra 

In this section, we will present an exact (i.e. non Monte Carlo based) Fisher matrix algorithm, which assumes 
contact factorizable trispectra. This is less generality than the Monte Carlo algorithm from the preceding section, 
which only requires a fast algorithm for evaluating the Q-symbol. 

Let T be contact factorizable with spins ay, /3j, 7/, Sj and coefficients Ajt, Bj(, Cit, Djt with / = 1, 2, • • • , Nf act . 
Likewise let T' be contact factorizable with spins a), fjj, jj, 5'j and coefficients A’ je , B’ je , C' je , D’ je with J = 
1,2,••• .JVZact- 


To obtain an exact expression for F(T,T r ), we calculate as follows. First write: 

F(T,T') = (Q T [a]Q T >[a]) 


f.C. 


(67) 


where a is a Gaussian random field with power spectrum 1 /(Cg + N/) and (-)f.c. denotes the fully connected part of 
the expectation value, i.e. the sum over Wick contractions in which all four contractions connect a factor of Qt[o] to 
a factor of Qt 1 [a]. We write the Q-symbols in the abbreviated form: 

Qt[o] = j / d 2 nM^/(n)M^/(n)M^(n)M^ J (n) + c.c. 

8 i=i J 

1 ^fsict /* / / / / 

Qt'M = J d2 " ,M 4 J ( a O M ^ J (nO M ^ J (“ , ) M ^' , (n , ) +C.C. (68) 


where we have introduced the following notation. If X = Xu is any ^-dependent quantity and s is an integer spin, 
then is the spin-s map defined by 


M*( n) =^2x e at m (sY im (n)) (69) 

im 

We plug the Q-symbols in Eq. (68) into the the expression (67) for F{T,T') and expand the result as a sum of Wick 
contractions. The contraction between two M fields or their complex conjugates is easy to calculate using the sum 
rule (A8) in Appendix A: 


Mf (n) Mf(n 1 ) = (-l)V*7'-(0) Mf (n) M s T(n')* = (-l) a C£ X ' + (0) 


(70) 


where 9 = cos 1 (n • n') is the angle between n, n' and we have defined correlation functions 

2 ^ + l x t x 't ,e ,p\ 

Css' (f)-(±!) z. 4n c e + Nt d ^ Ae) 


(71) 


When the dust settles, we get the following explicit formula for F(T,T'): 


2 ^fact -^fact 

F(T, T') = mtt E 

1=1 J=1 < 7 =+,- * 


^ d( COS 6) + 23 P erm -) ( 72 ) 


The integral can be evaluated exactly using Gauss-Legendre quadrature with (2f max + 1) points, since the integrand 
is a polynomial of degree 4£ max . For each quadrature point 8 , one ((-function value £(0) can be computed with cost 
CKAnax) using the recursion (A3). Thus the computational cost of computing F(T,T') is 0(N{ act Nf gct £^ nax ). 
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Let us compare the computational cost of the exact algorithm in this section with the Monte Carlo algorithm 
from the previous section (assuming contact factorizable trispectra). Suppose we have N tr total trispectra (i.e. the 
Fisher matrix being computed is lV tr -by-.ZV tr ) and each trispectrum has lVf act factorizable terms. The exact algorithm 
has cost 0(N 2 I N 2 gjCt £^ nax ), and the Monte Carlo algorithm has cost 0(lV mc ./Vtr-/Vfactt?max + N^N 2 ^ 2 iax ). 

Most interesting trispectra have factorizable representations with at most a few hundred terms (see Table I 
below), and the exact algorithm is actually faster due to the smaller power of £ max . The Monte Carlo algorithm is 
useful in situations where the total number of factorizable terms is very large. For an example, see Appendix D, 
where we describe a Fisher matrix based convergence test on numerical calculation of trispectra. The Monte Carlo 
algorithm is also the only option for exchange factorizable trispectra. For example, we will use the Monte Carlo Fisher 
matrix to compute the lensing bias to our WMAP g^L estimates (see Eq. (93) below). 


VII. OPTIMIZATION ALGORITHM FOR CONTACT FACTORIZABLE TRISPECTRA 

Consider the trispectrum generated by a quartic operator such as a 4 . So far, we have proposed a scheme 
for representing the tripsectrum in contact factorizable form (50), and shown that this representation reduces the 
computational cost of data analysis from ax ) to 0(Nf act £^ ngx ), by providing a fast algorithm for computing the 
Q-symbol Q[a\. However, this is not quite enough to bring the computational cost fully under control, since the 
number of terms lVf act in the factorizable representation can be very large. 

For example, consider the trispectrum generated by the quartic operator a 2 (da) 2 . To represent it in factorizable 
form, we must approximate the double (te,t) integral in Eq. (56) by a finite sum. To accurately approximate the 
detailed (£, m) dependence of the trispectrum, a huge number of sampling points in the (te , r) plane is required. This 
issue is studied in detail in Appendix D. As explained there, our sampling scheme has the property that the finite- 
sampled trispectrum approximates the exact trispectrum in a controlled sense: there is an end-to-end convergence 
test which shows that the two are nearly equal in the metric defined by the Fisher matrix. However, this requires 
many sampling points in the (r^,r) plane, e.g. for the operator a 2 (da ) 2 and WMAP noise levels, we find that 31763 
sampling points are needed! 

Fortunately, there is an optimization algorithm, first proposed for the bispectrum in [21], which can dramatically 
reduce the number of terms in the factorizable representation. The input to the algorithm is a trispectrum which has 
been represented in contact factorizable form with a large number N- m of terms. We write: 

N in 

Tin = Y.Ti (73) 

/=l 

where Tj is the 7-th term in the factorizable representation. The output is an “optimized” representation obtained 
by linearly combining a small subset of terms in the input representation. Formally: 

Nout 

Tout = ^2 w J Ti j ( 74 ) 

j= l 

with the subset {T/ 1 , T/ 2 , • • • } of terms and weights wj determined by the optimization algorithm. 

The first step in the optimization algorithm is to compute the Ni n -by-N- ln Fisher matrix Fjj = F(Tj,Tj) of 
individual terms in the input representation. We use the exact Fisher matrix algorithm from §VIB to compute 
Fjj. 7 Once this matrix has been computed, the optimized representation T out can then be computed using a purely 
formal linear algebra procedure described in §V.A of [21]. (This procedure was developed for purposes of optimizing 
the bispectrum, but the Fisher matrix contains all the information needed for the optimization, and once it has 
been computed it no longer matters whether the underlying objects are bispectra or trispectra.) The optimization 
algorithm guarantees that the input and output trispectra are nearly equal, in the sense that 

T(T in - Tout, Ti n - T out ) < 10- 5 E(T in , T in ) (75) 

where F(T,T') denotes the Fisher matrix element. This definition of “nearly equal” means that the two trispectra 
cannot be distinguished observationally with statistical significance. Because the Fisher matrix depends on the noise 
power spectrum, the optimized trispectrum depends weakly on the noise properties of the experiment being considered. 


' In this case, the exact Fisher matrix algorithm has computational cost whereas the Monte Carlo Fisher matrix algorithm 

from §VI A has cost 0(N mc An, f) nax + N mc AA) f( nax ). The exact algorithm turns out to be faster even for modest values of iV mc . 
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There is one more wrinkle: for the large input representations considered here with N- m > 10 4 , computing the 
Fisher matrix Fjj is a computational bottleneck. To get around this problem, we use a two-stage optimization 
algorithm as follows. We divide the input representation into M “chunks” of size (N- m /M), where typically M = 16 
or 32, and optimize each chunk separately. (Note that the total cost of optimizing all chunks is less than the cost 
of optimizing their sum, since the exact Fisher matrix algorithm scales as N 2 act , not 7Vf act .) We then combine the 
optimized chunks to obtain a semi-optimized representation of the input bispectrum, and do a second pass of the 
optimization algorithm to obtain the final optimized representation. 

In Table I, we show results of applying the optimization algorithm to the g l ^ c L , a 4 , a 2 (da) 2 , and (da) 4 trispectra. 
It is seen that the optimization algorithm results in a dramatic reduction in the size of the factorizable representation. 
The optimized representations will be used throughout the rest of the paper. 


Trispectrum 

Ni n 

N 0 ut 

fljVL 

960 

16 

d 4 

31763 

52 

a 2 (da) 2 

63526 

110 

(da) 4 

95289 

141 


TABLE I: Number of factorizable terms A r ; rl needed to represent each trispectrum by “brute-force” replacement of integrals by 
finite sums (see Appendix D for details), and number of terms N out obtained after running the optimization algorithm with 
WMAP noise levels. 


VIII. FISHER MATRIX ANALYSIS OF THE TRISPECTRA & 4 , & 2 (da) 2 , AND (da) 4 


In this section, we will study correlations between the trispectra {a 4 , a 2 (da) 2 , (da) 4 }, using the CMB Fisher 
matrix studied in §VI. 

We note in passing that for primordial trispectra, there is an alternate, simpler choice of Fisher matrix defined 
by: 


F(T U T 2 ) 


d 3 kr d 3 k 2 d 3 k 3 d 3 k 4 (Ck,Ck 2 Ck 3 Ck4)l <Ck 1 Ck 2 Ck 3 Ck 4 ) / 2 

(2tt) 12 P C (ki)Pc (W (W (*4) 


(2tt) 3 <5 3 (]>>,) 


(76) 


This is the appropriate definition for an observer who sees all ((-modes in a 3D volume (as opposed an observer 
who sees all CMB modes on a 2D sky). In the bispectrum case, the 3D Fisher matrix and the 2D CMB Fisher 
matrix tend to give nearly identical results in practice. However, this need not be so for the trispectrum, since 
3D—>2D projection actually reduces the dimensionality of the parameter space. The 3D (-bispectrum and the 2D 
CMB bispectrum are both functions of three parameters (assuming translation and rotation invariance in the 3D case, 
and rotation invariance in the 2D case). In contrast, the 3D ((-trispectrum is a function of six parameters, but the 
2D CMB trispectrum is a function of only five. As a point of mathematical principle, this implies that there must 
exist examples of (-trispectra which are weakly correlated in 3D, but become highly correlated when projected to 
the CMB. For this reason, we have used the CMB Fisher matrix throughout this section rather than the simpler 3D 
Fisher matrix (76), but we actually find that the two Fisher matrices agree well for the trispectra under consideration. 

Let us recall the Fisher matrix analysis for the bispectrum which gives rise to the parameters f^ L and f^L 1 [6]. 
There are two cubic operators to consider, 7r 3 and Tr(dn) 2 . These generate bispectra which are nonidentical, but 
correlated at the «0.9 level. This level of correlation is not so large that the two operators can be treated as indistin¬ 
guishable, but is large enough that orthogonalization is convenient [6]. We therefore apply a linear transformation in 
the parameter space (ft 3 , Tr(dTr) 2 ) to define (approximately) decorrelated observables /jv L ,/)y£ h - 

Analogously, for the trispectrum, the three quartic operators {a 4 , a 2 (da) 2 , (da) 4 } generate three distinct trispec¬ 
tra. Using the exact Fisher matrix algorithm from §VIB, the correlation matrix between these trispectra is found to 
be: 


/ 1 0.9484 0.7558 \ 

0.9484 1 0.9083 

\ 0.7558 0.9083 1 ) 


(77) 


From this Fisher matrix, it can be shown that any of the three trispectra is highly correlated to a linear combination 
of the other two. For example, the trispectrum a 2 (da) 2 is 99.2% correlated to a linear combination of a 4 and (da) 4 . 
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Therefore, we will not treat a 2 (da) 2 as a new trispectrum which is independent of the other two. More concretely, 
we can convert g < ^ r jf a ' 1 into the following effective values of g^ L and g: 


(9NL )eff — ®-620g N L 


(da) 


(»S?F) 


eff 


0.0936</ 


a 2 (da) 2 

NL 


(78) 


Note that we have choose our trispectrum basis to simply be the coefficients of the operators a 4 and (da) 4 , rather 
than orthogonalizing as in the case of the bispectrum. This somewhat simplifies the analysis and interpretation, but 
it should be kept in mind that the two operators are «75% correlated. 

The local trispectra g l § c L is not particularly correlated to any of the quartic operator trispectra 
(a 4 , a 2 (da) 2 , (da) 4 ). This can be understood by noting that the local trispectrum gets most of its signal-to-noise 
from the squeezed limit k± <C min (fey, k^, k±), whereas the other trispectra vanish in the squeezed limit. 

Throughout the preceding Fisher matrix analysis, we have used WMAP noise levels. If we use Planck noise 
levels instead, the results are qualitatively unchanged but the numerics are slightly different. The correlation matrix 
between the a 4 , a 2 (dia) 2 , ( di.a) 2 (dja ) 2 , and (dia) 2 (dja) 2 trispectra is: 


/ 1 0.9113 0.6142 \ 

0.9113 1 0.8572 (Planck noise) (79) 

\ 0.6142 0.8572 1 ) 


The a 2 (di.a) 2 shape is 98.6% correlated to a linear combination of the other two shapes. The coefficients which convert 
9 < NL d<7 ' > effective values of g^ L and g are: 


(g & NL )eS = 0.597 g"" 


(da) 2 


(gNL )eff = 0.0914g^' 9<T) (Planck noise) 


(80) 


IX. ANALYSIS PIPELINES 

In this section, we develop an analysis pipeline for estimating the amplitude of a trispectrum T for a realistic CMB 
experiment. We will actually develop two analysis pipelines which are appropriate for different sets of assumptions. 

In some experiments, it is computationally feasible to multiply a harmonic-space map by the operator C~ l which 
appears in the optimal trispectrum estimator (Eq. (24)). For example, this is possible for WMAP, since the noise 
model is simple: it is an excellent approximation to treat the noise covariance as diagonal in the pixel domain. In §IX A 
below, we develop an optimal pipeline for such experiments. 

In other experiments, it is infeasible to multiply a map by C~ , either because this is too computationally slow, 
or because the noise model is too complicated. The case we have in mind is Planck, although we will not attempt 
a Planck trispectrum analysis in this paper. The foreground-cleaned maps used for non-Gaussianity analysis by 
the Planck collaboration [9] have a noise covariance which in principle is determined precisely by the scan strategy, 
timestream noise properties, and foreground cleaning method. However, pixel-pixel correlations are important, exact 
multiplication of a map by N~ 4 is likely to be as expensive as full map-making, and multiplication by C~ 4 (requiring 
iterated multiplication by N~ 4 ) is likely prohibitive. Fortunately, we can still proceed by implementing a filter which 
approximates but is not precisely equal to C -1 . Another feature of the Planck analysis is that making Monte Carlo 
simulations of foreground cleaned maps is expensive. A common set of Monte Carlo simulations is shared between the 
Planck trispectrum analysis, bispectrum analysis, and other analyses, but it is impractical to make new simulations 
specifically for the trispectrum pipeline. 

With these considerations in mind, in §IX B below, we propose a “pure MC” pipeline which compares the 
trispectrum of the data to the trispectrum of an external set of Monte Carlo simulations, using a filter which is not 
necessarily equal to C~ 4 . 

An important property of the pure MC pipeline is that it does not assume that the simulations are Gaussian. 
For example, we might use lensed CMB simulations, which have a nonzero trispectrum. In this case, the pure MC 
pipeline is constructed so that it estimates the trispectrum of the data in excess of the simulations, i.e. lensing bias 
will automatically be subtracted from the estimated trispectrum. 

We have not worked out how to remove lensing bias in the optimal pipeline, since our immediate goal is to use 
the optimal pipeline to analyze WMAP, where lensing is a small effect. In cases where lensing bias is small, we can 
accurately approximate it using a Fisher matrix based estimate; see discussion near Eq. (93) below. 

In a case where C~ 4 is affordable but lensing bias is large, currently our only way of obtaining optimal error bars 
with reliable lensing bias subtraction would be to run the pure MC pipeline with C _1 filtering rather than running 
the optimal pipeline. This has one disadvantage: the optimal pipeline is much faster to converge than the pure MC 
pipeline, since we can use the assumption of Gaussian simulations to give a “fast MC” algorithm. The ultimate 
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pipeline would combine the fast convergence of the optimal pipeline and bias subtraction properties of the pure MC 
pipeline, but we defer construction of such a pipeline to future work. 


A. Optimal pipeline 

In this section we describe our first pipeline: an optimal pipeline which can be applied to experiments where 
C~ l filtering is practical. Although more general, this pipeline was developed with WMAP in mind. Let us state our 
assumptions explicitly: 

1. The observed CMB at m = st m + nt m is the sum of the true sky signal st m and a Gaussian noise realization nt m . 
(Our convention here is that at m denotes the beam-deconvolved map.) 

2. Given a harmonic-space map bt m , computing (C~ 1 b)e m is computationally feasible. 

3. It is also computationally feasible to randomly generate a signal + noise realization, i.e. a Gaussian random 
map b(, m with covariance matrix C. 

Throughout this section we will use the abbreviated notation 

atm = (81) 

As shown previously (Eq. (28)), the optimal estimator is £[a\ = (1 /F)£ 0 [a], where £o[a] can be computed as a Monte 
Carlo average over Gaussian signal+noise realizations b: 

£ 0 [a] = (Q[a, a, a, a] - 6<3[a, a, b, 6] + Q[b, b , 6, b]^ (82) 

The quantity F was defined previously in Eq. (25). It determines both the normalization of the estimator and its 

variance. More precisely, Var(£) = 1/F or equivalently Var(£o) = F. 

Since evaluating £q by Monte Carlo is straightforward given our assumptions, the only issue in the optimal 
pipeline is an algorithm for computing F. This involves some nontrivial computational challenges, as we now explain. 

Since F = Var(£ 0 ), one natural approach is to evaluate £$ on an ensemble of Gaussian simulations and estimate 
the variance to get F. Unfortunately, if implemented naively, the computational cost of this approach is 0(N^ C ), 
not 0(N mc )\ This is due to a curious property of the estimator (82): if we want to evaluate the estimator on a 
new realization a, we must recompute the Monte Carlo average (Q[a,a,b,b]) “from scratch” by looping over random 
realizations b with computational cost 0(N iac ). 

One idea for reducing the computational cost from 0(N^ C ) to 0(N mc ) is to group Monte Carlo simulations 
into pairs (bi,b[), ( b 2 , b' 2 ), • • •. We then express F as a Monte Carlo average involving only expressions which can be 

computed from a single pair, for example Q[bi, 6»] or Q[bi, bi , &'], but not Q[bi, 6j, bj, bj]. 

A second, more technical idea for reducing computational cost is to use Monte Carlo averages involving (dQ), 
which converge much faster than averages involving Q. This was noted previously in our discussion of the isotropic 
all-sky Fisher matrix (§VIA). 

Combining these ideas, we express F as the following Monte Carlo average over pairs (6, b r ): 

F = ^ &]) + (^QC^,6',6'])) 

((^ m Q[5,6,6 , ])C^ ro ,(^ w Q[6,6, &']) + (d em Q[b, V, {d e > m 'Q[b, b', &'])) 

{(dt m Q[bM)Ci^ e , m ,(dt' m 'Q[b,b',b'}) + («9^Q[6', 6', 6'])^^(«9^wQ[6,6, fo'])) . (83) 

The specific choice of coefficients (1/32,9/32,-3/16) is motivated in Appendix E. 

It is sometimes useful to know the “error on the error”, i.e. the statistical error on our estimate of F due to the 
finite number of Monte Carlos. We estimate this straightforwardly, since F is an average over pairs (6, 6'), so we can 
estimate its uncertainty from the scatter between pairs. The estimator for F given in Eq. (83) has been designed to 
minimize this scatter, and in practice we do not need many Monte Carlos to get convergence. 

As previously mentioned, we emphasize that the optimal pipeline assumes Gaussian statistics and in particular 
does not subtract lensing bias. Note that simply using lensed simulations in the optimal estimator (82) does not 
correctly remove lensing bias. For WMAP the lensing bias is small, but in a case where it is large and must be 
subtracted accurately, then the only option is the pure MC pipeline which we present next. 
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B. Pure MC pipeline 


We now describe our second pipeline, a pipeline which operates on an external ensemble of Monte Carlo simula¬ 
tions. We start by choosing a filter which can be applied to the data to produce a harmonic-space map ai m . To obtain 
near-optimal statistical errors, the filter should be chosen to approximate C -1 filtering as closely as possible. For 
example, to analyze Planck data, we could use the same filtering used for the bispectrum analysis [9]: we start with 
foreground-cleaned maps in pixel space, inpaint the mask, transform to harmonic space, and multiply by 1 /(C'c + Nf ), 
where Ni is a sky-averaged noise power spectrum. This filter is suboptimal in principle, since it is not precisely equal 
to C -1 , but has been shown to be near-optimal for Planck, at least for the bispectrum. 

Let us state the assumptions of our “pure MC” pipeline: 

1. The observed sky is specified as a filtered harmonic-space map a^ m , and we want to compare its trispectrum to 
a set of external simulations, also specified as filtered harmonic-space maps 6^,6^, • • •. (In this section, we 
use tildes to denote any map which has been processed by the filter.) 

2. The observed sky is a sum of signal and noise components a^ m = s# m + ne m (and likewise for the simulations). 
The filtered signal S(, m is related to the true CMB sky by a linear operator T, i.e. S£ m = (Ts)^ m , and the signal 
and noise are statistically independent. 

3. For each simulation, we know the underlying CMB realization se m which was used. An important feature of the 
pure MC pipeline is that we do not assume that either the CMB or noise realizations used in the simulations 
are Gaussian. If the simulations are non-Gaussian, then the trispectrum estimator will return an estimate of 
the trispectrum amplitude in excess of any trispectrum which is in the simulations. This is very convenient 
in practice. For example, if the CMB realizations may be lensed, and the noise realizations include residual 
foregrounds, then lensing and foreground contributions to the trispectrum will automatically be subtracted. 

4. Given an arbitrary CMB map 6^ m , there is a fast algorithm for computing ( Tb )^ m . Typically this will involve 
convolving with a beam or instrumental response, taking the spherical transform to pixel space, then applying 
the same filter which was applied to the data. 

Our pipeline will use an estimator of the form £{a\ = (l/.F/v)£o[o] ; where the unnornralized estimator is defined by 
the Monte Carlo average: 


£ 0 [a\ = Q[a , a, a, a] — 6 


Q[a,a, b, b ]) - 


Q[b,b,b,b]) +6 (Q[b,b,b',b r 


b,V 


(84) 


and the normalization Fn will be specified shortly (Eq. (85) below). 

It is easy to verify two key properties of this estimator. First, its expectation value over the simulations vanishes: 
(£(j[6]) = 0. This means that £q [a] measures the trispectrum of the data relative to the trispectrum of the simulations, 
as desired. Second, if the two-point function of the simulations does not perfectly match the two-point function of 
the data, due to slightly incorrect cosmological parameters or noise model, then the bias on £q will be second order. 
As shown previously in §111, this is an important property of the optimal estimator, and we would like to preserve it 
in our pure MC pipeline. 

The last term in the estimator (84) is a double Monte Carlo average over pairs of simulations ( b,b '). This is 
necessary because we are not assuming Gaussian simulations. If simulations were Gaussian, then we could use the 
relation (Q\b, b , b, b])b = 3(Q[6, b, b ', b'])b,b’ to rewrite the last term as an average over single simulations b. Note that 
if we assumed Gaussianity, and also assumed optimal filtering (i.e. a = C^ 1 a) then the estimator £q would reduce to 
the optimal estimator studied previously. 

In our pure MC pipeline, we would like to compute the unnormalized estimator £ 0 [a], the estimator normalization 
Fjv, and the variance Fy = Var(£ 0 ) by Monte Carlo. We would also like to compute the “error on the error”, i.e. the 
uncertainty in Fy when we estimate it by Monte Carlo. Let us now discuss each of these in turn. 

Considering first the estimator normalization Fn, one can show that it is given by the following Monte Carlo 
average over pairs of simulations ( bi m ,b ' ern ). 


F n = (^(dt m Q[b,b,b])(TdQ[s,s,s])e m + ^(de m Q[b, b, b'])(TdQ[s, s, s']) em 


3 
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(d em Q[b,b,b})(TdQ[s,s',s']) im - ^(dt m Q[b,b',b'])(TdQ[s,s,s])e m + ° ) (85) 

oz \s s / / 

/ b,b' 
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where (s£ m , s' em ) denotes the underlying CMB realizations used in simulations ( b, b'). The specific choice of coefficients 
here was motivated in the previous section (see discussion near Eq. (83)). This expression is a “fast MC” scheme, in 
the sense that the fractional error in Fn is much better than yj2/N mc . 

Next we consider the estimator variance Fy = Var(£ 0 ). Note that in the optimal pipeline from the previous 
section we had Fn = Fy , but in the pure MC pipeline where the filter need not be precisely equal to C _1 , one has 
F/v ^ Fy in general. A short calculation gives the following general expression for Fy: 

Fy = ( (Q[6, b, b, b) - 6 Q[b, b, b’, &']) (q[6, b, b, b] - 6 Q[b, b, b", &"]) ) - (<?[&, b, b, b] - 6 Q[b, b, V, V)^ (86) 

To write this in a slightly different way, let us temporarily imagine that we have computed the quantity 

Qij = Q[bi,bi,bj,bj\ (87) 

for every pair of simulations (bi,bj). Note that computing every Qij has computational cost 0(N{ 2 nc ), which is 
something we are trying to avoid, but we will address this shortly. Then the following estimator has expectation value 
(Fy) = Fy. 

p y = E 1 Qii ~ w E + w E ( 88 ) 

v . * N2 m A;! b,c 

— QiiQ-j j + -Jy- QiiQjk — jy - QijQkl 

2 { ij } 3 {ijk} 4 { ijkl} 

where we have defined = N mc (N mc — l) ■ ■ ■ (N mc — k + l), and ’Yhujk---} denotes a sum over distinct indices i,j,k,--- 
between 1 and N mc . This can be simplified slightly by defining 


Rij — 


6 Q ij Qa Qjj if i 7 ^ j 
0 if i = j 


A short calculation then shows that Fy simplifies to: 


Fv — jy 5Z RijRik jy 5Z F-ijRki 

3 {ijk} 4 {ijkl} 


(89) 


(90) 


We would also like to estimate the “error on the error”, i.e. the variance of Fy. It is easy to see that the following 
estimator has expectation value Var(Fy): 


^ {Fy ) jy ^ ^ Rij RikRlmRln “b y. ^ ^ Rij RikRlmRno y ^ ^ 


{ijklmn} 


Nj 


{ ijklmno } 


JV« 


RiiRkiR mnRi 


op 


(91) 


{ij klmnop} 


Evaluating E in this form has 0(N* lc ) computational cost, which may be a problem in practice. In Appendix F we 
give an equivalent expression with lower cost. 

Summarizing results in this section so far, we have expressed the unnormalized estimator £ 0 , the normalization Fjy, 
the variance Fy, and the “error on the error” E as Monte Carlo averages (Eqs. (84), (85), (90), (91)). These expressions 
contain double Monte Carlo averages over pairs of simulations (6,6'), naively leading to 0{N^ C ) computational cost. 
We avoid this as follows. We divide the ensemble of N mc simulations into ( N mc /M ) subsets containing M simulations 
each. For each such subset, we evaluate Eqs. (84), (85), (90), (91) using only the M simulations in the subset (replacing 
N mc when it appears by M of course). We then average over all subsets to obtain our final estimates for £ 0 , Fn, F v , 
and E. This reduces computational cost from 0{N^ C ) to 0(N mc M). 

For a fixed total number of Monte Carlo simulations N mc , the larger we choose M, the more accurate our estimate 
for the variance Fy will be, but the computational cost of the trispectrum pipeline will also increase. Note that we 
must choose M > 8 in order for the expression (91) for E to make sense. We have found that M = 16 or M = 32 is 
usually a good compromise. 


X. WMAP RESULTS AND INTERPRETATION 


We conclude this paper by constraining the parameters g l § c L , g^L, and from WMAP data. 
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In WMAP, the C' _1 -filtering operation is computationally feasible, so we can use the optimal pipeline from §IX A. 
We implement C~ 4 -filtering using the multigrid conjugate gradient algorithm from Appendix A of [88]. The C~ l 
filter optimally combines data from the six V-band and W-band WMAP channels (V1,V2,W1,W2,W3,W4), and 
incorporates the kq75 sky mask by assigning infinite variance to masked pixels. The filter marginalizes foreground 
templates, sky monopoles, and dipoles in an analogous way, by assigning infinite variance to the appropriate pixel- 
space modes, independently in each of the six channels. These details of the filtering are the same as the optimal 
bispectrum analysis in the WMAP nine-year results paper [34] (see also [89]). 

We ran the optimal pipeline with N mc = 2048 Monte Carlo simulations, which turned out to be overkill: the 
“error on the error” due to the finite number of simulations was 0.4% for the local gNL, or 0.05% for the other two 
trispectra. The constraints from our pipeline are: 

g l ° c L = (-3.71 ± 2.19) x 10 5 

g^ L = (-2.32 ± 3.09) x 10 6 (92) 

g^ = (-9.07 ± 6.33) x 10 5 

No statistically significant deviation from Gaussian statistics is seen. This is the first constraint on the (<9a) 4 trispec¬ 
trum. Constraints on the other two trispectra have been previously reported as follows. 

The optimal estimator for g l £ c L has also been implemented in [19], where the constraint g^ L = (—3.3 ±2.2) x 10 5 
was obtained. This agrees nearly perfectly with our result in Eq. (92): the error bars are identical and the central 

• 4 

values differ by 0.2a. This is expected since the optimal estimator contains no free parameters. The g^ L trispectrum 
was studied in [17], where the constraint g a NL = (—2.88 ± 6.94) was obtained. 8 The smaller error bar in Eq. (92) is 
partly due to our use of WMAP9 data rather than WMAP5, and partly due to our use of the optimal estimator. 

The g NL central values in Eq. (92) are slightly biased by gravitational lensing. We can study the bias sernian- 
alytically using the Fisher matrix. Since the lensing trispectrum Ti ens is not contact factorizable, we cannot use the 
exact Fisher matrix algorithm (§VIB), but since Ti ens is exchange factorizable, we can still use the Monte Carlo Fisher 
matrix algorithm (§VIA). 

Using a Fisher matrix forecast with WMAP9 noise levels, we find that the correlation coefficients of the lensing 
trispectrum with the local, d 4 , and (dcr) 4 trispectra are 0.02, 0.15, and 0.14 respectively, and the total signal-to-noise 
of the CMB lensing trispectrum is only 2.1. This makes it intuitively clear that the lensing bias is small. 

To quantify this better, we can estimate the lensing bias to each of the gjy l parameters semianalytically as 
follows. For any primordial trispectrum T, we approximate the bias as A g^L = F(T, T\ ens )/F(T, T). This gives the 
following estimates for lensing bias: 

A g§l = 9.24 x 10 3 Ag% 4 L = 8.82 x 10 5 A g^ = 1.71 x 10 5 (93) 

which are 0.04a, 0.3 <j, and 0.3a shifts respectively. Subtracting lensing bias, our final “bottom line” trispectrum 
constraints are: 

9nl = (-3.80 ± 2.19) x 10 5 

9n L = (-3.20 ± 3.09) x 10 6 (94) 

g^ = (-10.8 ± 6.33) x 10 5 

The above semianalytic prescription for lensing bias is approximate, since it is only valid to lowest order in Cf^, and 
also approximates the noise covariance of the survey as all-sky isotropic. Note that since the CMB lensing trispectrum 
is not scale-invariant, the preceding Fisher matrix based results are specific to WMAP noise levels, and lensing will 
be a larger effect in Planck. For WMAP, where lensing is small, the semianalytic bias correction is adequate, but for 
Planck a more accurate treatment will be necessary. 

Each “bottom line” trispectrum constraint in Eq. (94) is a constraint on a single g^L parameter assuming that 
the other g ,v l - parameters are zero. We also consider the case of a joint constraint on the parameters (g%L- 9nV ); 

which are 75% correlated. It is convenient to introduce the vector notation g t = (<7^’Swi'* )• Let F i3 be the two- 
by-two Fisher matrix, which can be constructed as follows. The diagonal is given by Fa = l/af, where a* is the 


The parameter C/ 1 / 11 defined in [17] is related to our g^ L by %!) n = (27/25 )g^L 
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9 nl / 10 7 

FIG. 2: 68% and 95% confidence regions in the (g^Li 9 nl? ) plane, with the Lorentz invariant model in Eq. (99) shown as the 
dashed line. 


g^L statistical error in Eq. (94). The off-diagonal is then given by F V2 = r^F^ 2 F^ 2 , where 7 'i 2 = 0.7558 is the 
correlation coefficient from Eq. (77). From this procedure we obtain the Fisher matrix: 


( 1.05 3.86 
3.86 25.0 


x 1CT 13 


(95) 


Let gi = (—3.20 x 10 6 , —1.08 x 10 6 ) be the vector of gNL estimates appearing in Eq. (94). Now for a given parameter 
vector gi, we define a trispectrum x 2 -value by: 9 

X 2 (g) = ((- Fg)i - F ll g l )F~ ] \{Fg) j - F J3 g 3 ) (96) 

This x 2 can be thresholded to obtain constraints in various parameter spaces of interest. For example, we can plot 
error ellipses in the {g < NL’9N < L )-pl ane i showing the off-diagonal correlation (Fig. 2). The 68% and 95% regions are 
obtained by thresholding at y 2 = 2.279 and x 2 = 5.991 respectively, as appropriate for a \ 2 random variable with 
two degrees of freedom. 

We conclude with some brief physical interpretation. In single-field inflation, only the quartic operator 7T 4 is 
allowed by the symmetries to induce a large trispectrum without generating an even larger bispectrum. Its coefficient 
M 4 in the action (9) is related to g a NL by Eq. (13). Using our WMAP constraint from Eq. (94), we get the following 
constraint on A7 4 : 

M 4 r 3 

(-2.47 x 10 15 ) < —^ < (7.86 x 10 14 ) (95% CL) (97) 

We can develop a more intuitive understanding of this limit by noticing the following two facts [31]. First, in the 
case where in single field inflation this operator leads to observable non-Gaussianities, the speed of sound is not 
expected to be parametrically smaller than one: c s < 1. Second, the unitarity bound of the theory, Ajj, scales as 
A fj ~ 167r 2 (i7M| 1 ) 2 M 4 7 4 . Therefore the limit in Eq. (97) translates to g^ L A^ ~ (H 4 /Af r ) < 10 —3 , which is of 
order the inverse square root of the number of signal-dominated modes in WMAP, as expected from a Fisher matrix 
analysis. 

In multifield inflation, the limits above can be translated into limits on the ratios H/ (27rAi) between the de Sitter 
temperature H/(2tt) during inflation and the scale suppressing the higher dimension operators in the action (12), in 
this way effectively mapping cosmological information into constraints of parameters of a fundamental Lagrangian. 10 * * * 


9 To derive this y 2 , we note that the unnormalized trispectrum estimator F,,g, has expectation value Fijgj and covariance matrix 
Cov(Fagi, Fjjgj) — Fij. 

10 Some readers might wonder why we call the effective field theory of inflation as a fundamental theory. Even though it is just an effective 

theory valid up to a scale of order A, this energy scale is still extremely large, and indeed quite fundamental. Furthermore, since we are 

not probing directly the energy scale active during inflation, but we do this only indirectly though the CMB and the LSS, the effective 

field theory of inflation is the only theory we are testing through observations, unless we make further assumptions. 
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Generally, our results show that ( H/(2nA )) 4 must be smaller than 10 -2 or 10 -3 . This tells us that the quartic 
interactions are still largely unconstrained, and new cosmological probes, such as Large Scale Structure surveys, will 
be required to significantly improve the limits. For example, if A 2 and A 3 are assumed to be zero, then our WMAP 

• 4 

constraint on g^ L in Eq. (94) gives the following constraint on A 3 : 


-8.09 x 10 -3 < 


1 H 4 

(2^F Af 


< 2.57 x 1CT 3 


(95% CL) 


(98) 


As another example, consider a one-parameter space consisting of the Lorentz-invariant quartic interaction: 

s = J *v /= fl^(W 2 + i ||(9/) 2 (9^) 2 j (99) 

For a given value of A, we use Eq. (14) to compute g^L coefficients, use Eq. (78) to absorb g a into the values 

of g^ L and g^ , then compute y 2 using Eq. (96). Thresholding this y 2 in Eq. (96) at Ay 2 = 4, we obtain the 95% 
confidence limits: 


-4.42 x 10" 4 < 


1 H 4 
(2tt) 4 A 1 


< 4.00 x 1CT 5 


(95% CL) 


( 100 ) 


XI. DISCUSSION 

The main conclusions of this paper are as follows: 

• To lowest order in the derivative expansion, the quartic operators allowed by the symmetries of inflation are a 4 , 
a 2 (dcr) 2 , and (da) 4 . In single-field inflation, only the a 4 operator is allowed by the symmetries to induce a large 
trispectrum without generating an even larger bispectrum, but multifield inflation allows an arbitrary linear 
combination of the three operators. A Fisher matrix analysis shows that there is one near-degeneracy between 
these three operators, which we can use to approximate the a 2 (da) 2 trispectrum as a linear combination of a 4 
and (da) 4 . 

i .4 (dcr ) 4 

• Based on this analysis, we propose the parameter space (g l § c ^, g^Li 9 nl ) as a starting point for analyzing 
inflationary 4-point signals. This is roughly analogous to the parameter space (/]((£, f % 4 L , /v} 4 ') for the 3-point 
function. It will be interesting to explore 4-point signals beyond these leading ones. In particular, “exchange” 
trispectra arising from cubic operators and exchange of a light field during inflation are not included in this 
parameter space, and would be interesting to study in future work. 

• We propose two factorizability conditions for the trispectrum, contact factorizability and exchange factorizability, 
and study the contact factorizable case in detail. We argue that, in order to apply to operators with spatial 
derivatives such as (da) 4 , the definition of factorizability for the trispectrum must include higher-spin fields. 

• For each of our trispectra, we write the CMB trispectrum as either a single integral over a radial coordinate r 
(in the case of the local trispectrum) or a double integral over (te,t) using a Feynman diagram (in the case of 
the quartic operator trispectra). By approximating the integral by a finite sum, we represent the trispectrum 
as a sum of a large number of factorizable terms, then apply an optimization algorithm to obtain a compact 
factorizable representation. 

• We emphasize that the final compact representation obtained in this way approximates the exact trispectrum 
in a controlled sense: the two are nearly equal in the metric defined by the Fisher matrix. This is because 
our integration scheme includes an end-to-end Fisher matrix based convergence test (Appendix D), and the 
optimization algorithm is also guaranteed to converge in the Fisher matrix metric. 

• We develop a toolkit of algorithms which can be applied to factorizable trispectra, including estimator evaluation, 
non-Gaussian simulation (Appendix B), and Fisher matrix calculation. 

• We develop an optimal trispectrum pipeline and apply it to WMAP, finding consistency with Gaussian statistics. 
We also develop a “pure MC” pipeline which scales to Planck. The optimal pipeline can be used if C -1 is 
computationally affordable and lensing bias is small enough to be estimated semianalytically. The pure MC 
pipeline relaxes both of these assumptions but is slower to converge. 
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• The tools we have developed in this paper are sufficient to analyze the local, a 4 , a 2 {da ) 2 , and {da) 4 trispectra 
in WMAP and Planck. However there are a few directions in which our machinery might be improved. In order 
to analyze exchange trispectra, one would need to generalize the optimization algorithm from §VII. It would 
be interesting to improve the sensitivity of our pipeline to noise modeling, as suggested in §IIID. Finally, in 
cases where C _1 is affordable, we currently have to choose between the fast convergence of the optimal pipeline 
(§IX A) and precise calculation of lensing bias in the pure MC pipeline (§IXB). 
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Appendix A: Wigner d-functions and spin-s spherical harmonics 


In this appendix, we briefly review properties of the Wigner d-function d as , (0) and spin-s spherical harmonics 
( s Yi ra ) which will be used in the text. 

For integers s,s', the Wigner d-function d e ss ,(9) is defined for i > max(|s|, |s'|) and satisfies the orthogonality 
condition: 

J d(c os 9) d[\, ( 9 ) d& ( 9 ) = ^^S ei e 2 (Al) 

as well as the identity: 

di s ,-A0) = diM = (-1 ) s+s 'diA0) (A2) 

The Wigner d-functions can be computed using the recursion relation 

«sJ 1 ( 6 ') d ^ 1 ( 61 ) - ( 2£ + !) (^cosd- d e ss ,(9) +a e ss ,{9)d e ~,\9) =0 {£ > max(|s|, |s'|)) (A3) 

where A ss , = \/(P — s 2 )(£ 2 — ( s ’) 2 )/£. If s' > |s|, then initial conditions for the recursion are given by: 


d s s A0) = 


(2s')! 


(s' -|s|)!(s' + |s|)! 


l+cosd\ (s ' +s)/2 /l^cos6>\ (s ' _s)/2 


(s' > |s|) 


(A4) 


Initial conditions for arbitrary (s, s') can be obtained through use of Eq. (A2). 

A spin-s field ( s f) is a field whose value at a point n depends on a choice {ei, e 2 } of local orthonormal frame at 
n, such that under a change of frame (ei ± ie 2 ) —>■ e ±ia (ei ± ie 2 ), the field value transforms as ( s /) —► e~ xsa ( s f). 
The spin-raising operator 6 and spin-lowering operator 6 are defined by: 

6 = m a V a 3 = m* a V a (A5) 

where we have defined the spin-1 vector field m a = (ei — ie 2 ). The spin-raising and spin-lowering operators transform 
a spin-s field to fields of spin (s + 1) and (s — 1) respectively. 

The spin-s spherical harmonics (. s T) m ) are an orthonormal basis for spin-s fields on the full sky, defined for t > |s| 
and —£ < m < t. The spin-raising and spin-lowering operators act on the ( s l) m ) by: 

3( s Yem) = \/ (£ ~ s )(^ + s + l)(s+l^)m) 3( s Y( m ) = —\J{£+ s)(£ — s + l)(s-lWm) (A6) 

The spin-s harmonics satisfy the identity: 

(sYem)* = (-1 ) s+m (- s Y^ m ) (A7) 

and are related to Wigner d-functions by the following sum rule: 


_ op 1 1 

E (sY em (n)) ^Y tm (n')r = < B >V) (AS) 

m——i 

where 9 = cos _1 (n • n') is the angle between unit vectors n,n'. The sum rule applies in the “two-point” frame where 
the local frame vectors ei, e[ at points n, n' have been chosen to point along the geodesic connecting the two points. 
(Note that the LHS of the sum rule depends on the choice of frames, but the RHS does not, so it must be understood 
that the sum rule applies only in the two-point frame.) 


Appendix B: Non-Gaussian simulations 

In [21], an algorithm was proposed for simulating a random, weakly non-Gaussian CMB realization with prescribed 
power spectrum and bispectrum. It is straightforward to generalize this algorithm to simulate a CMB realization with 
prescribed power spectrum Ce and trispectrum 3 m 4 • 
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First, a definition. For any angular trispectrum T, define the symmetric matrix Tw by: 


E /_I \m-\-m'rptii'l' 

v 1 ^ m{—m)m'{—m') * 


(Bl) 


This matrix arises in several contexts. First, if a^ m ,be m are all-sky Gaussian random fields, then Tpjj appears in the 
following expectation values: 


1 


(QAa]) = -Y^Twcrc? 


if 


K f m f ^mQr[o]) = 


cf 


2(2£ + 1) ■— 


E rri s-'iac 


(B2) 

(B3) 


Second, if the CMB is non-Gaussian, then the estimated CMB power spectrum Cg = (2£ + 1) 1 J2 m a em a £m contains 
a term proportional to Tw : 


^ ^ p/^2 


+ 


T„ 


(2e + i)(2e + i) 


(B4) 


Note that non-Gaussian power spectrum covariance due to the gravitational lensing trispectrum has been studied 
extensively (e.g. [90, 91]); Eq. (B4) generalizes to an arbitrary trispectrum. 

Our non-Gaussian simulation algorithm is as follows. We first simulate a Gaussian field af m with power spectrum 
Cp_. and then define the non-Gaussian field by: 


a?m = a ?m + \dimQ\afjCe\ ~ \ (21 +l)C e C r °*" 


(B5) 


A short calculation shows that the power spectrum and four-point function of the simulated field are given by: 

{a*? atm) =Ce + 0(T 2 ) = CL + °( B6 ) 

where 0(T 2 ) denotes contributions which are second-order in the trispectrum F^ 2 3 ^ 4 3m4 . The last term in Eq. (B5) 
has been included in order to avoid an order-O(T) correction to the power spectrum. (We note that odd (27V+l)-point 
correlation functions of are zero, and even (27V)-point connected correlation functions are of order 0(T W_1 ).) 

To apply the simulation algorithm, we need to compute Tw. In the case where T is contact factorizable we can 
do this using a method similar to the exact Fisher matrix algorithm from §VIB. We write Qt[o] in the abbreviated 
form: 


-| N fact p 

= i E / <i 2 «(n)M^'(n)M^(ii)M^ (n) + c.c. 


(B7) 


where we have defined M*{ n) = ^ a <m(s^m(n))' We can compute the expectation value (Qt[o\) using Wick’s 

theorem and the contraction 


Mf(n) M*'(n) = (-1 ) s £ 


(B8) 


obtaining: 


(«tW) = 4 x) — - 1>(2f +11 crcr 


1=1 W 


47T 




Comparing with Eq. (B2) we can read off an expression for Tw (note that we symmetrize in £,£'): 


+ c.c. 


(B9) 


-Nfact 


Tw — ~~ ~ ^2 (—l)“ J+7/ A{B I ( Cl,D\,5 ai -p I 5 1I -s I + (—1 ) ai+l31 A[Bl,ClD I ( ,5 ai - 1I 5p I -s I 


48-7T 


1=1 


+( — ApBj/Cg/Dpdaj^—SjSfjj ,—+ c.c. 


(BIO) 
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This algorithm for computing X 'w can be generalized to the exchange factorizable case, using the same strategy of 
computing (Qt[o]) with Wick’s theorem, but we omit the details. Note that the matrix T«/ may be useful outside 
the context of non-Gaussian simulations, since it appears in the non-Gaussian power spectrum covariance (B4). 

This generic simulation algorithm formally generates a non-Gaussian field whose power spectrum C(, and trispec¬ 
trum T are prescribed, up to contributions of order 0(T 2 ). A significant caveat is that for some trispectrum shapes, 
these 0(T 2 ) contributions can be large even for modest levels of non-Gaussianity. We have not experimented much 
with the simulation algorithm, but based on experience with the analogous bispectrum algorithm, we expect it it will 
work well for shapes which do not have large squeezed limits, for example the a 4 , d 2 (da) 2 , and (da) 4 shapes. 

The local trispectra g l fi c L and tnl have large squeezed limits, so we do not expect our generic simulation algorithm 
to work well in these cases. One alternate approach is to simulate 3D fields at the end of inflation, apply the relevant 
local operation (either C = Cg + (9/25)5jvlCg or C = Cg + t a/lCg' j ), and then apply the CMB transfer function to 
generate ae m ’s [92]. Another approach is to reweight terms in the generic algorithm to avoid infrared divergences in 
specific cases (see discussion in the appendix of [93]). 

The gravitational lensing trispectrum (63) is another example of a shape with a large squeezed limit, where we 
do not expect our generic simulation algorithm to work well. In this case the best approach is to simply simulate the 
lensing deflection T(n) —> T(n + V0(n)) directly [94]. 


Appendix C: Factorizable representations for gj}%, g^Li 9 nl 

In §V C, we calculated the factorizable representation explicitly for the trispectrum generated by the quartic operator 
a 2 (da) 2 . In this appendix, we do the same for the g\y c L trispectrum, and the trispectra generated by the operators a 4 
and (da) 4 . 


1- 9 nl shape 

First we consider the g^° L shape. The ((-trispectrum was given previously in Eq. (2): 

'54 


(Ck 1 Ck 2 Ck 3 Ck 4 )c = ( ^9NL P d k i) P d k 3) P d k 4) + 3 perm.j (2 tt) 3 (5 3 ( ^ 


(Cl) 


Following the previous calculation in §VC, we plug into the projection formula (37), replace the delta function 
(5 3 (J0 kj) by f d 3 rexp(i ]>/ k, • r), and do the angular parts of the k integrals, obtaining: 

9 f 00 f ( 4 f 2 k 2 dk ■ \ 

Qt[o] = —fife / f 2 dr / d 2 n I H / — 1 - - E Ui(h r )^h(ki)at imi Y(, imi (n) J P^k-^P^k^P^ki) 

"'° Vi—1 * ^ ltr/ii J 

= 9nl J r 2 dr J d 2 n f ^ a e (r)a em Y em (n)\ ( ^ (r)a e > m 'Y e >m' (n)'j (C2) 


where the functions a^(r), pg(r) were defined previously in Eq. (61). Comparing with the definition (51), we see that 
after replacing the r integral by a finite sum, the trispectrum is contact factorizable with all spins equal to zero. 


2. g% L shape 


Next we consider the quartic operator & 4 . The £-trispectrum was given previously in Eq. (5): 

(Ck 1 Ck 2 Ck 3 Ck 4 >c = j dr E T% ( -j- j (2 7 r) 3 5 3 (y]k i ) 

Using the same method of calculation as in §VC„ we find the following expression for Qt[o]- 


(C3) 


384 ‘ 3 > 4 f° dr 


Qr[a] — -^rA^-g^L 


E / 

— oo J 0 


drr|r 2 / d 2 n J]^ 


2 k 2 dki . ,, , A ,, K e‘ 


u=i ( 


kiT E 

jii (kir)A e . (ki)— — ae imi Y eim . (n) 
7r ki 


(C4) 
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Using the notation nz(r E ,r) defined previously in Eq. (57), we rewrite this as: 

( \ 4 
Y Ah(te, r)ae m Ye m (n) 

lm ) 

After replacing the ( r E ,r) double integral by a finite sum, Qt[o\ is of contact factorizable form (50). 


(C5) 


3. g^ shape 

Finally we consider the case of a quartic operator (di<j) 2 (djcr) 2 . The (-trispectrum was given previously in Eq. (7): 


/a {■ {- (■ \ — 82944 (d<r) 4 ,3 

\<,kiCk 2 (,k 3 <,k 4 /c — 9575 9 nL 


dT * n 


(1 -kiT E )e kiTE ' 


Ki =1 


kf 


(ki-k 2 )(k 3 -k 4 )+2 perm.^ (27 t) 3 <5 3 ( Y ,O (C 6 ) 


In this case we find the following expression for Qt[o]' 


/o r 1 1152 (Oo -) 4 

Qt H = - 9nl 


^ dr E [ d 3 r <9jF ( Te ’ r ) dF ( T E, r ) dF ( T E, r ) 9F(t e , r) 


<9r, ; 


dr 


dri 


d r. 


where we have defined 

F(t e , r) = 


3\ 1/2 

5/ 


^ 3/4 f 2/c“ dk y-> 


(1 — k,T E )e kTE . 


tm 


k 3 


je(kr)A e (k)a em Y em (r) 


As in §V C we can split the gradients into radial and tangential terms using the identity: 

2 




(C7) 


(C 8 ) 


(C9) 


where the real and tangential derivatives can be written in terms of the functions v, oj introduced previously in Eq. (57): 

I '' 1 = ^ M t e, r)ae m Ygm(r) dF{r E ,r) = rY^i( T E,r)ae m {iY im (r)) (CIO) 

tm tm 

Plugging this in we get the representation of Qt[o\ in contact factorizable form: 
n r 1 1152 (g a y 

Qt H = ~[^9nl 

|_ \ tm / I tm I 

(Cll) 


rO />00 n 

/ drE / drr 2 d 2 n 

(V~l ^(TE, r)ffllmYim(n) 

2 

1 + 

Y ue{r E , r)a,£ m {iY em (n)) 

2 " 

/-oo Jo J 

V tm ) 


tm 



4. Deviation from scale invariance 

We have now obtained explicit factorizable representations for the shapes {g x § c L , 9 nli Qnl^ but have assumed 
a scale invariant power spectrum P^(fc) = A^k~ 3 throughout. To compute these shapes in the WMAP or Planck 
cosmologies, we need to generalize slightly to the case of a power-law spectrum P^(fc) = A,yk ns ~ 4 . 

In fact, our factorizable representations have been written in such a way that they generalize to an arbitrary 
power spectrum, by simply plugging it in whenever P^(fc) appears in the definitions (Eqs. (57), (61)) of the functions 
ae(k), Pt{k), ne(k), ve(r E ,k), and Lo e (T E ,k). 

This prescription has several nice properties. First, it gives the correct trispectrum in the local case, i.e. when 
( = (<3 + g l E c L Ca with arbitrary P^(fc). Second, it is the analogue of the prescription which is commonly used for the 
bispectrum (e.g. Eqs. (51)-(53) of [95]). Finally, for a power-law spectrum P<y{k) = A ( A’ n< ’ -4 , the ((-trispectrum scales 
under dilations as: 

(CAkiCAkXAkaCAkJ' = A 3 ( " s_ 4 ) (C k iCk 2 Ck 3 Ck 4 Y (Cl 2 ) 

In the case of trispectra other than the local one, the deviation from scale invariance of the bispectrum and trispectrum 
cannot be reconstructed from the tilt of the power spectrum [96]. The above parametrization is however the closer 
guess to the actual dependence we can expect, and it is the correct one for the local case. 
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Appendix D: Numerical calculation of trispectra 

We have now written down factorizable representations for the local, a 4 , and (da) 4 trispectra. In this appendix we 
discuss computational issues in calculating these trispectra numerically. The chain of steps is: 

1. We precompute the CMB transfer function A i(k) on a grid of k- values. 

2. Each trispectrum shape is represented either as a single integral over r (in the case of the local shape, Eq. (C2)) 
or double integral over ( r E ,r ) (in the case of the a 4 and (da) 4 shapes, Eqs. (C5) and (Cll)). We choose a finite 
sampling for this integral, in order to obtain a factorizable representation with a finite number of terms. 

3. For each point in the (te,t) plane, we compute the functions ap(r), /3e(r), p^(te,t), U((te,t), and w^(rE,r) 
appearing in the factorizable representation by evaluating the appropriate fc-integral (Eqs. (57), (61)). 

Let us consider each of these steps in detail, starting with the CMB transfer function A e(k). The transfer function 
can be written as a line-of-sight integral [97]: 

A e (k)= J d X S(x,k)je(k X ) (Dl) 

We obtain the source function S(x, k) from CAMB [83] and evaluate the above integral using equal spacing Ay. We 
compute the transfer function up to maximum wavenumber fc max on a grid of /c-values defined using the fc-dependent 
step size A k = min(efc, Ko )■ This sampling scheme switches from equal spacing in log(fc) at low -k to equal spacing in 
k at high-fc. We choose the following default values for the parameters just defined: 

(A X , fc max , e, k 0 ) = (0.5,5000r^ z , 2 x 10" 3 ,3 x 10" 5 ) (D2) 

Next consider discretization of the (te,t) integral (step 2 above). We discretize the outer te integral using equally 
spaced points in log \te\ from initial time r E i to final time r E f. Our default parameter values are: 

(T E i,T E f, A log \t e |) = f-10 6 Mpc,- 5 ^ MP ° , log ^ 10 ^ (D3) 

For each t e , we discretize the inner r integral as follows. Let r rec and n lor i z be the comoving distance to recombination 
and the causal horizon respectively. The integral formally goes to r = oo, but the integrand decays beyond rhoriz, 
with characteristic decay scale given by |te| plus the sound horizon. Therefore, we integrate from r = 0 to r max = 
Hioriz + Po + o:|te|, with default parameter values 

(p 0 , a) = (2000 Mpc, 10) (D4) 

We sample the r integral with spacing (Ar)i from r = 0 to r rec , spacing (Ar )2 from r rec to rhoriz, and spacing (Ar)i 
from r horiz to r max , where 

(Ar)i = max (pi, (3\t e \) (Ar) 2 = max (p 2 , P\t e \) (E>5) 

with default parameter values: 

(pi,p 2 ,P) = (50 Mpc,5 Mpc, 0.1) (D6) 

In the case where the trispectrum is represented as a single integral over r, rather than a double integral over (r E ,r), 
we use the ?’-sampling for t e = 0. 

Finally, consider evaluation of k- integrals (step 3 above). We sample the integrals at the same values of k where 
the transfer function is computed as described previously. The fc-integrals include factors of either the spherical Bessel 
function ji(x) or its derivative j[(x). We precompute je(x) using Steed’s algorithm [98] on a regularly spaced grid 
with Ax = 0.2 and interpolate to arbitrary x. To evaluate j[(x) we use the identity: 


3e(x) = 2 ^ +l je-i(x) - (for t > 1). (D7) 

This concludes our description of the numerics. To verify that numerical errors are fully controlled, we use the following 
end-to-end convergence test. In the above discussion we defined tolerance parameters controlling the accuracy of the 
integration. We compute an “improved” trispectrum using more conservative values of tolerance parameters as follows: 


(te*, r E f, A log |te|, po, pi, p 2 , /5, &max5 e,«o, Ax, Ax) 



tej_ 2 
10 ’ 3 


log |te|, 2p 0 



P2 
2 ’ 



ko Ax Ax\ 
2 ’ 2 ’ 2 ) 


(D8) 
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and also adjusting several parameters in CAMB. We then verify that the original and improved trispectra are nearly 
equal, in the metric defined by the Fisher matrix. For this comparison, we do not optimize the trispectra, since the 
number of terms A/a.ct i n the improved trispectrum will be very large, and the optimization algorithm will be too slow, 
but computing the Fisher matrix is still affordable using the Monte Carlo algorithm from §VI A. This end-to-end test 
is a complete check that any numerical errors in our calculation of the trispectrum are not observationally important. 


Appendix E: Constructing the estimator F 

In the optimal pipeline (§IXA) we introduced the following estimator: 

F = | {(d em Q[bM)Ci^ e , ml (de' m 'Q[b, M]) + {d tm Q[b', b', b'})C^ tlrn ,(d t , m 'Q[b',V, b 1 }) 

+ C + (dtmQW* ,V]) 


((^ m Q[6,6,&])C'7 m 1 i ,, rn ,(^ m ,Q[6,&',6']) + (d tm Q[V,V,l/])C^, ml (d em 'Q[bAV])) (El) 


with coefficients (a,(3,-y) = (1/16,9/16, —3/8). In this appendix we explain how these coefficients were chosen. 

We use a “contraction” notation in which each line between factors of T denotes one factor of £ , m , conti 
with the appropriate indices. For example, the quantity F defined in Eq. (25) could be denoted: 


i /Nil-INK 1 ^ 

F = s( T T ) = a X r. 


^ i mim 2 m 3 m 4 
£irrii£'-m 


/ ^^2^2 i^ m 2 ^^3 m 3 ^3 m 3 /4WI4 m^m^m^m'^ ^ ' 


and, as another example: 
( 


n 11 11 n 

T T 


— 1 — 1 — 1 — 1 rj-1^2^2^3^4 

mi 77137714 u fimi w £2^2 5^2 m 2 V_/ ^3 m 35^3 m 3 W -^4rri4, m^m^m^m^ 

iimii'-m’- 


(E3) 


Our estimator F should have the property that ( F) = F. It is easy to calculate the expectation value of Eq. (El), 
obtaining: 


a 


^ ' 6 + 18 J 


£)( 


T 


a (3 


7 


\ / n I r 

T ) + (- + — + — ( T 
1 1 4 36 12 / V 


n l n 
T 


(E4) 


We see that ( F ) = F if the coefficients (a, /?, 7 ) satisfy the constraints: 


a (3 1 

6 + 18 ~ 24 


a + ^ + ^ = o 

4 36 12 


(E5) 


These constraints do not fully determine (a,/3, 7 ); there is a 1-parameter family of solutions. We noticed empirically 
that the choice (a,/3, 7 ) = (1/16,9/16, —3/8) nearly minimizes the variance Var(F), and even a small change in these 
coefficients results in a dramatically larger value of Var(E). We subsequently found a semianalytic explanation for 
this phenomenon as follows. The variance Var(F’) is a sixteen-point function which can be expanded using Wick’s 
theorem as a sum of many terms. One of these terms is: 


/ 4 \ 2 / n 1 1 — 



“Cl n s 

( 12a + -(3 + 4 7 J (T 

r) 

(r 

t) 


(E 6 ) 


We can speculate that this term will dominate Var(F), since it “maximally factors” in the sense defined in [99]. If we 
set this term to zero by imposing the constraint 12a + (4/3)/3 + 4y = 0 in addition to the constraints (E5), then we 
obtain the coefficients (a,/ 3 , 7 ) = (1/16,9/16, —3/8). 
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Appendix F: Efficient evaluation of Fy and Var (Fv) 


In the pure MC pipeline (§IXB), we gave expressions for estimators Fy and X, used to estimate the statistical 
error on g^Li and the “error on the error”. As given (in Eqs. (90) and (91)), these expressions have computational 
cost 0(N^ C ) and 0(N^ C ) respectively. In this appendix we give mathematically equivalent expressions with cost 
0{Nl c ) and 0(1V* C ). 

We decompose R^ = Sij + T) + Tj + U, where we have defined 


U = 


E ij -fry 


N{N — 1) 


T, = 


Ej Rij E,,/*’ 


jk n jk 


N - 2 N(N - 2) 


Q _ J 

°ij ~ 


Rij — Ti-Tj-U ifi^j 

0 if i = j 


(FI) 


This is the unique decomposition Ryj = Sij + Ti + Tj + U (for i ^ j) satisfying ]TX Tj = JT Sij = Su = 0. In terms 
of the new variables S, T, U, a long computer algebra assisted calculation gives the following alternate forms for Fy 
and X: 
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